Authors

Shengbo Gao, Ziji Zhang, Jiechao Ma, Zihao Li, Shu Zhang

Abstract

Semi-supervised learning has become increasingly popular in medical image segmentation due to its ability to leverage large amounts of unlabeled data to extract additional information. However, most existing semi-supervised segmentation methods only focus on extracting information from unlabeled data, disregarding the potential of labeled data to further improve the performance of the model.In this paper, we propose a novel Correlation Aware Mutual Learning (CAML) framework that leverages labeled data to guide the extraction of information from unlabeled data. Our approach is based on a mutual learning strategy that incorporates two modules: the Cross-sample Mutual Attention Module (CMA) and the Omni-Correlation Consistency Module (OCC). The CMA module establishes dense cross-sample correlations among a group of samples, enabling the transfer of label prior knowledge to unlabeled data. The OCC module constructs omni-correlations between the unlabeled and labeled datasets and regularizes dual models by constraining the omni-correlation matrix of each sub-model to be consistent. Experiments on the Atrial Segmentation Challenge dataset demonstrate that our proposed approach outperforms state-of-the-art methods, highlighting the effectiveness of our framework in medical image segmentation tasks. The codes, pre-trained weights, and data are publicly available.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_10

SharedIt: https://rdcu.be/dnwb8

Link to the code repository

https://github.com/Herschel555/CAML

Link to the dataset(s)

https://github.com/Herschel555/CAML

Reviews

Review #3

Please describe the contribution of the paper

This paper studies the semi-supervised medical image segmentation task. Specifically, the paper proposes a novel Correlation Aware Mutual Learning (CAML) framework that leverages labeled data to guide the extraction of information from unlabeled data. The motivation is clear and the experiments can demonstrate the effectiveness. Although discussions about some related works are missed, I think it is still a good paper and can be accepted with revision. I am very glad to improve my ratings if the authors can answer my questions well.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Pros:
1. The writing and organization are acceptable;
2. The motivation is clear and makes sense.
3. Experiments can demonstrate the effectiveness of the proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Weakness:
1. My biggest concern is that the main idea is very similar to [1]. [1] is the first to utilize the reliable information of labeled images to guide the learning of unlabeled images. However, it is not discussed in this paper. And the formulations of your consistency regularization are also very similar to that of [1], which heavily limits the novelty of this paper. The authors should discuss [1] in the revision and highlight the difference.
2. The implementation is based on CPS [2]. But the results of CPS are not presented in the experiments. I think CPS can be easily employed to this task, is it right?
3. Code should be available for the potential followers.
Refer:
1. Querying Labeled for Unlabeled: CrossImage Semantic Consistency Guided Semi-Supervised Semantic Segmentation. TPAMI 2022
2. Semi-supervised semantic segmentation with cross pseudo supervision. CVPR, 2021
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method is not easy to reproduce. Thus, it will be better to release the code for reproduction.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
This paper studies the semi-supervised medical image segmentation task. Specifically, the paper proposes a novel Correlation Aware Mutual Learning (CAML) framework that leverages labeled data to guide the extraction of information from unlabeled data. The motivation is clear and the experiments can demonstrate the effectiveness. Although discussions about some related works are missed, I think it is still a good paper and can be accepted with revision. I am very glad to improve my ratings if the authors can answer my questions well.

Pros:
1. The writing and organization are acceptable;
2. The motivation is clear and makes sense.
3. Experiments can demonstrate the effectiveness of the proposed method.
Weakness:
1. My biggest concern is that the main idea is very similar to [1]. [1] is the first to utilize the reliable information of labeled images to guide the learning of unlabeled images. However, it is not discussed in this paper. And the formulations of your consistency regularization are also very similar to that of [1], which heavily limits the novelty of this paper. The authors should discuss [1] in the revision and highlight the difference.
2. The implementation is based on CPS [2]. But the results of CPS are not presented in the experiments. I think CPS can be easily employed to this task, is it right?
3. Code should be available for the potential followers.
Refer:
1. Querying Labeled for Unlabeled: CrossImage Semantic Consistency Guided Semi-Supervised Semantic Segmentation. TPAMI 2022
2. Semi-supervised semantic segmentation with cross pseudo supervision. CVPR, 2021
Recommendation: The authors should discuss the most related papers and highlight the difference between their method and others, which can show the novelty clear.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The motivation is clear and the method is intuitive to me.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper
The paper describes a novel method of semi-supervised learning based on the concept of guiding the latent representation of unlabelled data closer to that of the labelled data, as opposed to the other way around. The authors’ contributions based on a teacher-student paradigm are two-fold:
- a cross-sample mutual attention module, which consists of inter-sample (within a batch) and intra-sample attention layers
- an omni-correlation consistency regularisation, where latent representations of both the labelled and unlabelled images are compared to those of a memory bank of labelled latent representations
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- novelty: the idea of solely comparing unlabelled latent representations to the memory bank, hence incentivising unlabelled features to be similar to labelled features, is novel and an interesting approach
- novelty: the cross-sample attention efficiently combines two types of attention
- clarity: the paper is well structured, and the methodology is well explained
- empirical evidence: the method is extensively validated through comparison to 7 other semi-supervised methods, over multiple labelled-unlabelled dataset splits and over multiple seeds
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- context: previous work on semi-supervised learning used memory banks [1]. The authors should highlight how their method differs from theirs.
Please rate the clarity and organization of this paper

Excellent

Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The implementation details are well explained. The choice of N for the memory bank (N x Dlabelled x C) is not reported, nor it is explained how the memory bank picks the first N pixels to store.

Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The authors could further improve their paper by ablating on the requirement of the memory bank, that is simply comparing z_v to z_p. In addition, running experiments on different datasets would strengthen the paper further.

“Please refer to [17] for a detailed description of the CPS module and loss design of Ls and lc.” I believe the correct citation is [3]
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The contributions of the paper are novel, well-justified and extensively validated; the paper is well-written.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper proposed a semi-supervised segmentation framework named Correlation Aware Mutual Learning (CAML). In addition to the cross-pseudo supervision which was explored in ref [17], the authors proposed two other components: Cross-sample Mutual Attention (CMA) and Omni-Correlation Consistency (OCC). CMA exploits mutual attention along spatial dimension and batch dimension in the mini-batch simultaneously. And the OCC enforces consistency in omni correlation between the two branches. The omni correlation is calculated between the feature embedding from unlabeled data and labeled data (which was stored as prototypes in the memory bank).

With the OCC and CMA, the proposed CAML achieved significant improvement over the previous SOTA framework, especially in the situation of only a few labeled scans (n=4, 5%) are available. When there are 16 labeled scans available, improvement is neglectable.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Novelty in exploiting labeled data through memory bank and inter-/intra- sample cross attention. The framework proposed not only exploits unlabeled data through cross-pseudo supervision, but it also exploits labeled data through OCC and CMA modules. The extensive comparisons with previous studies indicate superior performance when the available labeled scans are extremely limited.
2. The contribution of each component is well justified and supported by ablation studies.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. A few errors/typos/unclear points: a. What’s the DC-Net in Fig. 2? Is it the proposed CAML? b. What’s the batch former in Table 2? Is it the proposed CMA module?
2. The justification for only applying CMA in the auxiliary model, and not applying CMA to both the vanilla and auxiliary model is not provided. Or the ablation study should be conducted for that.
3. Compared to the detailed explanation of OCC, the formulation and definition of CMA were not provided clearly
4. How is the memory bank updated? Is it updated through exponential moving average? It is not provided in the manuscript.
5. Do you require spatial alignment for the CMA? Does the inter-sample self-attention only work under the assumption that the data have the exact same field of view and are anatomically aligned? If so, it should be included as a limitation in the conclusion.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility is good, data is public, and code is open-source.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. The clarity should be improved. Please see my detailed comments provided in the weakness section.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I would recommend a strong acceptance of the paper, given the significant improvement when the labeled scans are extremely scarce and the novelty of CMA and OCC. But I found some parts confusing, vague, or missing, so I would suggest authors improve on those aspects.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a semi-supervised segmentation method by utilizing Cross-sample Mutual Attention (CMA) for mutual attention along spatial dimension and batch dimension in the mini-batch and Omni-Correlation Consistency (OCC) for consistency in omni-correlation between the two branches. The paper is well written and the reviewers consistently rated this paper positively. Accept. Please the authors address the concerns and questions raised by the reviewers in the final paper.

Author Feedback

We appreciate the thoroughness with which you have examined our work. We will address each of your comments and make the necessary revisions to improve the manuscript.

Reviewer 1:

Regarding the question about DC-Net and the batch former, we apologize for the typos in the manuscript. DC-Net refers to the proposed CAML, and the batch former refers to the proposed CMA module. we will adjust the terminology in the revised manuscript to avoid confusion.

The reasons for only applying CMA in the auxiliary model is two-fold. First of all, the insertion of the CMA module requires a batch size of large than 1 to model the attention among samples within a mini-batch. For model inference, we still need a vanilla V-Net to inference each sample independently (batchsize=1). Moreover, we model the vanilla and the auxiliary branch with different architectures to increase the architecture heterogeneous for better performance in a mutual learning framework. We will clarify this point in the revised manuscript.

With regards to the formulation and definition of the CMA module, detailed introduction of the CMA module can be found in the second paragraph of sec2.2 and an illustration of the module is provided in the bottom part of Fig.1. Furthermore, we have made the code for CAML publicly available, where you can find the implementation of the CMA module.

We apologize for not explicitly mentioning how the memory bank is updated. Following MoCo, we update the memory bank in a query-like manner, where it is directly replaced by the embedding of labeled data predictions.

Regarding the question about spatial alignment for the CMA module, we want to clarify that the CMA module does not require spatial alignment. By cascading intra- and inter-sample self-attention modules, the CMA module establishes mutual attention among each voxel in a group of samples, even if they have different field of view and anatomical alignment.

Reviewer 2:

Regarding the parameter choice N in OCC memory bank, as explained in the Memory Bank Construction section, N corresponds to the number of labeled training samples. If you are referring to the parameter n, it corresponds to the number of labeled prototypes in the OCC memory bank, which is explained in the Embeddings Sampling section in detail, and reported exactly in Implementation Details.

We appreciate your suggestion regarding the ablation of the OCC module and conducting further experiments on different datasets. Due to the length limitations of the manuscript, our primary objective was to present the motivation and effectiveness of the proposed CAML. However, we acknowledge the importance of extensive experiments and will incorporate them in future work. 3.Thank you for pointing out the incorrect citation. We will rectify it in the camera-ready version.

Reviewer 3:

We appreciate your concern about the similarity between our approach and CISC-R. Although the motivation behind CAML is similar to CISC-R, there are two fundamental differences that set CAML apart. Firstly, CAML adopts a cross pseudo-supervision (CPS) framework, which is an end-to-end approach, whereas CISC-R follows a multi-stage self-training framework. Secondly, instead of directly revising the embedded features of unlabeled data, the OCC module in CAML constrains the omni-correlation matrix of each sub-model to ensure consistency throughout the framework. We would further clarify the difference between CAML and CISC-R in the revised manuscript.

Regarding the implementation based on CPS, we would like to clarify that CAML is built upon the MC-Net, which is a CPS framework with minor yet effective modifications. We compared the performance of CAML with MC-Net in the experiments, so it is not necessary to compare CAML with CPS.

Meta-reviewer: Thanks for your positive comments and suggestions. We will incorporate these revisions into the camera-ready version to enhance the clarity and comprehensiveness of our work.

back to top

Correlation-Aware Mutual Learning for Semi-supervised Medical Image Segmentation