Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Qingbiao Guan, Yutong Xie, Bing Yang, Jianpeng Zhang, Zhibin Liao, Qi Wu, Yong Xia

Abstract

Accurate automated segmentation of infected regions in CT images is crucial for predicting COVID-19’s pathological stage and treat- ment response. Although deep learning has shown promise in medical image segmentation, the scarcity of pixel-level annotations due to their expense and time-consuming nature limits its application in COVID- 19 segmentation. In this paper, we propose utilizing large-scale unpaired chest X-rays with classification labels as a means of compensating for the limited availability of densely annotated CT scans, aiming to learn robust representations for accurate COVID-19 segmentation. To achieve this, we design an Unpaired Cross-modal Interaction (UCI) learning frame- work. It comprises a multi-modal encoder, a knowledge condensation (KC) and knowledge-guided interaction (KI) module, and task-specific networks for final predictions. The encoder is built to capture optimal feature representations for both CT and X-ray images. To facilitate in- formation interaction between unpaired cross-modal data, we propose the KC that introduces a momentum-updated prototype learning strat- egy to condense modality-specific knowledge. The condensed knowledge is fed into the KI module for interaction learning, enabling the UCI to capture critical features and relationships across modalities and en- hance its representation ability for COVID-19 segmentation. The results on the public COVID-19 segmentation benchmark show that our UCI with the inclusion of chest X-rays can significantly improve segmentation performance, outperforming advanced segmentation approaches includ- ing nnUNet, CoTr, nnFormer, and Swin UNETR. Code is available at: https://github.com/GQBBBB/UCI.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_58

SharedIt: https://rdcu.be/dnwBS

Link to the code repository

https://github.com/GQBBBB/UCI

Link to the dataset(s)

https://covid-segmentation.grand-challenge.org/

https://cxr-covid19.grand-challenge.org/

https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a method, Unpaired Cross-modal Interaction, to learn visual representations from limited densely annotated CT scans and abundant image-level annotated X-rays. The UCI framework aims to learn representations from both segmentation and classification tasks. It includes three main components: a multi-modal encoder for image representations, a knowledge condensation and interaction module for unpaired cross-modal data, and task-specific networks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A novel setting is introduced, where CT scans with segmentation annotations and CXRs with classification labels are jointly leveraged to enhance the segmentation performance on CT scans.
2. The implementation details are sufficient.
3. The paper is well written and easy to read.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The proposed settings sounds a little bit wired. I hope the authors can briefly clarify the application scenario of the proposed setting.
2. Introducing a large number of annotated CXRs fails to provide notable performance gains: ~1.7%-percent gains over the baseline as shown in Fig.2.
3. The baseline performance is low, making me doubt the applicability of the proposed method.
4. I’m surprised that the authors did not include a discussion of cross-modal learning for medical imaging, such as [a, b].
[a] Cao, Xiaohuan, et al. “Deep learning based inter-modality image registration supervised by intra-modality similarity.” Machine Learning in Medical Imaging: 9th International Workshop, MLMI 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 9. Springer International Publishing, 2018. [b] Chen, Xiaoyu, et al. “MASS: Modality-collaborative semi-supervised segmentation by exploiting cross-modal consistency from unpaired CT and MRI images.” Medical Image Analysis 80 (2022): 102506.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The supplementary material provides architecture details for the encoder, decoder, and classification head. Therefore, I think the reproducibility may be satisfactory.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please refer to the weakness part.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the paper is well written and the proposed method seems novel. My major concern lies in the practical value and application prospect of the proposed new settings (i.e., limited CT scans with rich annotations and lots of CXRs with image-level labels). Also, the authors failed to discuss the relation to cross-modal learning for medical imaging, which is very relevant to the context. Therefore, I recommend a weak accept and hope the authors could address my concerns in the rebuttal.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper presents three key contributions. Firstly, it has employed abundant X-ray images with image-level annotations to improve COVID-19 segmentation on limited CT scans, where the CT and X-ray data are unpaired and may have potential distributional differences. This is the first study to utilize this approach. Secondly, it has introduced a knowledge condensation and interaction module that uses momentum-updated prototype learning to concentrate modality-specific knowledge. Additionally, a knowledge-guided interaction module is proposed to harness the learned knowledge for boosting the representations of each modality. Finally, the experimental results demonstrate the effectiveness and strong generalizability of its UCI learning method in COVID-19 segmentation and its potential for related disease screening. The proposed framework can be a valuable tool for medical practitioners in detecting and identifying COVID-19 and other associated diseases
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Interesting and important topic
- Detailed flowchart
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Not full pre-processing steps
- Not comprehensive presentation of results
- Ablation studies are necessary
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Needs more details about the data, data preprocessing, and code implementation
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Although the paper reads quite well, it is not free from grammatical errors. Professional proofreading is required before a possible print in a journal _ The abstract section should be rewritten in order to clearly state the manuscript main focus -Introduction needs to be rewritten, in a way the motivation, state-of-the art approaches and contributions are clear -Comparison with other method should be done and stated more clearly.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

-
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper entitle as “Unpaired Cross-modal Interaction Learning for COVID-19 Segmentation on Limited CT images” proposed a multimodal framework to leverage the relatively larger scale amount of data from one of the modalities compered in benefit of the scarcity amount of sample from the other modality. Their approach includes two modules to facilitate the interaction of the two modalities and to capture the critical feature representation in merits of a accurate final segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper make emphasis in the importance to leverage the available resources, in this case data. Since CT scans are scarce and typically expense, this lack of data could be alleviated by using other type of image protocols such as X-Ray. In addition the framework proposed is covering several aspect related to the interaction of the modalities and the feature representation to produce high-quality output segmentations.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper needs few minor revisions, for instance the inclusion of references in the first part of the introduction, the mention of the dataset in Table 1. In addition to some mathematical support for the use of multi head self attention to leverage the feature representation from the different modalities.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of the proposed framework by reading only the paper is not an easy task. Availability of the code and demo would be very appreciated by the community.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

It is a good paper, very well structured and easy to read. Few minor corrections are suggested on the previous responses. Other than that, just the inclusion of qualitative results in the manuscript could help to have a better idea about the performance of the proposed framework. In addition, it is recommended an analysis on how efficient is the model in terms of FLOPS, trainable parameters, inference time and such. Few more mathematics would be very appreciated, specially the justification on the use of the multi-head self attention mechanism.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper propose a novel framework for multimodality segmentation, it is very well structured, probably the number of experiments presented in the paper could be larger. However, they are making a good analysis in the current limitations that inspired the proposal of the framework.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

Based on the author’s feedback, it appears that they have effectively addressed the major critiques and provided clarification where necessary, for example:

Discuss Medical Cross-modal Learning (R1Q4): The author acknowledges the reviewer’s suggestions and commits to including additional cross-modal methods in the paper. They explain that some suggested methods require paired training data or pixel-level annotations, which are not applicable to their scenario involving the joint use of 3D CT scans and 2D X-rays. Ablation (R2Q3): The author acknowledges the space limitations and explains that they have discussed the effectiveness of each module, the number of prototypes, and the setting of the momentum factor in their ablation studies. Qualitative Results (R3Q1): The author acknowledges the space limit and indicates that the results of their UCI method and other segmentation models were visualized in the Supplementary section. Complexity (R3Q2): The author provides the FLOPS, number of parameters, and inference time of each method, including their UCI method, nnUNet, CoTr, nnFormer, and Swin UNETR.

Based on the author’s thorough and appropriate responses to the reviewers’ comments, it is reasonable to accept the paper for publication. The author has addressed the concerns, provided additional clarification, and demonstrated the novelty and value of their proposed Unpaired Cross-modal Interaction (UCI) learning framework.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed an unpair cross-modal interaction framework to improve the segmentation of COVID-19 cases with the annotated X-ray images. The overall pipeline of this work is sound. Experiments on public datasets also demonstrate the performance of this method. As pointed by reviewers, this work liees in the formulation of cross-modal learning, which has been widely investigated in MICCAI society. Current comparisons in this work are not sufficient to support the claims of authors. Please carefully prepare your rebuttal to address the issues mentioned by reviewers.

Author Feedback

We appreciate reviewers for their invaluable comments and recognition of the novelty (R1, R3) and value (R2) our Unpaired Cross-modal Interaction (UCI) learning framework. We will address these comments and apply careful revisions (e.g., necessary details, clarifications, citations, and formulas) to our final version. The code will be released upon publication. R1Q1: Application Scenario Our method applies to the scenarios where a CT segmentation task (e.g., COVID-19 lesion segmentation) suffers performance issues from the limited availability of high-quality annotations (e.g., 149 scans for this study), while a vast number of more accessible and cost-effective X-Ray images with image-level annotations are available for co-training the segmentation model. Such co-training is based on cross-modal knowledge transfer. Speicifically, the pathological knowledge about lung diseases is extracted from X-ray classification tasks and usd to enhance the segmentation performance via knowledge condensation and interaction (KC&KI), which can transfer modal knowledge even when the mutil-modal data are unpaired. R1Q2: Limited Gains (1.7%) over Baseline We applied the paired t-test to the DSC values obtained by our UCI and the baseline method on all (50) test samples, and achieved a p-value of 0.035 (<0.05), indicating that our performance gain is statistically significant. R1Q3: Low Baseline Due to its advanced Transformer architecture and extensive data augmentation, the baseline is competitive with the famous nnUNet (Baseline: 0.6726 DSC, 122.30 HD and 29.49 ASD vs. nnUNet: 0.6794 DSC, 132.55 HD and 31.28 ASD). Note that, we reduced the iterations from 250k to 80k in the ablation study for efficiency (Section 3.4), leading to lower results than those in Table 1. For this study, we incorporated the proposed KC&KI module to the baseline, allowing the use of 2D X-ray images for co-training. Consequently, our UCI outperforms four established methods, including nnUNet, CoTr, nnFormer, and Swin UNETR. R1Q4: Discuss Medical Cross-modal Learning In the 3rd paragraph of Introduction, we discussed various methods for multi-/cross-modal learning, including [4,10,11,18,19]. We appreciate the reviewer’s suggestions and will include more cross-modal methods such as Cao et al. (MLMI’18) and Chen et al. (MedIA’22). But, it should be noted that Cao’s method is for image registration and requires paired trainig data. Chen’s method requires pixel-level annotations, while our method can use the X-ray images with only image-level labels for training. Additionally, our scenario involves the cross-dimension issues (i.e., jointly using 3D CT scans and 2D X-Rays), which cannot be addressed in both methods. R2Q1: Pre-processing We refer to the preprocessing method of nnUNet. R2Q2: Comparison We compared our UCI with four prevalent or SOTA medical image segmentation methods. nnUNet is well-known for its outstanding performance. CoTr is one of the first to blend CNN and Transformer for multiscale segmentation. nnFormer is a Transformer-based method with solid results across multiple datasets. Swin UNETR is proposed for 3D brain tumor segmentation and is also the SOTA on MSD and BTCV Segmentation Challenge dataset. R2Q3: Ablation Due to the space limit, we only discussed the effectiveness of each module, the number of prototypes, and the setting of momentum factor in our ablation studies (see Sec. 3.4). R3Q1: Qualitative results Due to the space limit, the results of our UCI and other segmentation models were visualized in Fig. 2 in the Supplementary. R3Q2: Complexity Here are the FLOPS, number of parameters and inference time of each method. Methods FLOPS(e9) Params(e6) Inference time(s) nnUNet 691 45 29 CoTr 753 43 32 nnFormer 386 149 35 Swin UNETR 329 62 79 Ours 397 62 42

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have carefully answered the issues raised by reviewers. Therefore, I recommend the acceptance of this submission.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed to combine unpaired CXR with classification label and CT with segmentation label to boost the segmentation accuracy. The paper is mostly well written and easy to understand. The exp shows good improvements with detailed ablation study. However, I have concerns on if the application scenario is real. Although there are more public CXR data than CT data, why is it the case for hospitals? Another concern is how to make knowledge condensation differentiable, since it searches for the nearest neighbor. I hope the author could address them in the final version if the paper is finally accepted.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal addresses my concerns. The author responded in detail to the doubts about Medical Cross-modal Learning, Ablation and Limited Gains over low Baseline. The only deficiency of refutation is that the format is very messy and the reading impression is very poor.

back to top

Unpaired Cross-modal Interaction Learning for COVID-19 Segmentation on Limited CT images