List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Yi Lin, Yufan Chen, Kwang-Ting Cheng, Hao Chen
Abstract
Medical image segmentation has made significant progress in recent years. Deep learning-based methods are recognized as data-hungry techniques, requiring large amounts of data with manual annotations. However, manual annotation is expensive in the field of medical image analysis, which requires domain-specific expertise. To address this challenge, few-shot learning has the potential to learn new classes from only a few examples. In this work, we propose a novel framework for few-shot medical image segmentation, termed CAT-Net, based on cross masked attention Transformer. Our proposed network mines the correlations between the support image and query image, limiting them to focus only on useful foreground information and boosting the representation capacity of both the support prototype and query features. We further design an iterative refinement framework that refines the query image segmentation iteratively and promotes the support feature in turn. We validated the proposed method on three public datasets: Abd-CT, Abd-MRI, and Card-MRI. Experimental results demonstrate the superior performance of our method compared to state-of-the-art methods and the effectiveness of each component. Upon publication of this paper, we will release the source code for our method.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43895-0_22
SharedIt: https://rdcu.be/dnwx4
Link to the code repository
git@github.com:hust-linyi/CAT-Net.git
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
Authors propose a novel framework for few-shot medical image segmentation, named CAT-Net, based on cross masked attention Transformer. The proposed network mines the correlations between the support image and query image, and it can be iteratively applied to continually refine the segmentation performance. To evaluate, they use three public available datasets and prove its state-of-the-art performance.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
This paper proposes an effective approach that focuses on the interaction between support and query features.
-
As can be seen from the experimental section, it achieves state-of-the-art performance on three public datasets. The proposed Iterative Refinement framework also aims to strengthen the above ideas.
-
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-
The explanation about the prototype is not clear. Although it looks the same as PANet, it can confuse readers who are not familiar with this domain (few-shot learning).
-
The proposed approach lacks justification. Why does such an exchange work? Why exchange the Q, not K or V?
-
Missing the comparision with the last year’s miccai articles. “Few Shot Medical Image Segmentation with Cross Attention Transformer” which achieves competitive results compared to yours.
-
The author should provide 5-shot or 10-shot experiments to claim the “few-shot” setting. And adding experiments on other parts of the body using reliable public CT/MRI datasets will help illustrate the generalizability of the method compared to easily segmented organs such as liver and kidney.
-
The inference process in not clear. It would be better if the figures in the supplementary material were presented in the main text.
-
Typo: In section 2.4, the paragraph “Cross Masked Attention Module”, K^v, Q^v, V^v seems to be K^s, Q^s, V^s.
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
No code
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
-
Please provide the justification of the proposed method.
-
More experiments on 5-shot or 10-shot are needed to prove that the proposed method is feasible for few-shot tasks.
-
I will give accept if the questions are well answered
-
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The novelty and the justification of the proposed method.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
This approach is effective because it uses mutual information between queries and targets. While the clarification about exchanging Q and K/V is not clear, the quality of the work is somewhat above the acceptance threshold.
Review #3
- Please describe the contribution of the paper
The authors have developed a novel network for few-shot medical image segmentation called CAT-Net based on cross masked attention Transformer that includes an iterative refinement framework validated on 3 public datasets including CT of the abdomen, MRI of the abdomen, and cardiac MRI. The computational premise of few-shot segmentation is thoroughly described herein, notably diagrammed in Figure 1 and shown in action in Figure 2. Multiple other networks are compared to CAT-Net and found to have smaller Dice scores, although no statistical comparison is made to quantify whether this difference is significant.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Well-described, plausible module to accomplish tedious segmentation with a few-shot platform.
- Reasonably understandable mathematical description of the approach employed.
- Figure 2 is an excellent depiction of CAT-Net in action and is very effective to demonstrate its efficacy.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The difference between the included comparative networks and CAT-Net are not too different and it is unclear whether CAT-Net performs significantly better.
- Several spelling errors are noted throughout the abstract (e.g. trainig, parametirc)
- The Ablation study is unclear. In a clinical sense, this represents a specific procedure that seems to not be implied here but can be relevant given the need for radiographic guidance for this procedure.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Many of the queries listed in the reproducibility responses are thoroughly addressed in the manuscript.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Is it possible for the differences found on Dice analysis of the various networks against CAT-Net to be compared for statistical significance? Empirically, it appears CAT-Net performs better but it remains unclear to what degree. If not superior in this regard, why would CAT-Net be the optimal platform for use? Does it run on less data and thus performs more nimbly compared to the other networks?
- Interestingly, the Dice % plateaued after 4 iterations of the CMAT module. Can the authors expound on why this may be the case? Could this inform ways to augment CAT-Net in future versions to improve its accuracy?
- The upcoming plan to pursue 3D imaging is sound but reference to bridging toward a clinical application would have been preferred.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Thorough explanation of how the CAT-Net platform functions with excellently depicted proof of concept in Figure 2. Well-selected networks provided for comparison with the novel few-shot segmentation platform developed herein. The specifics of whether the CAT-Net platform performs better than the other networks is not provided. The upcoming plan to pursue 3D imaging is sound but reference to bridging toward a clinical application would have been preferred.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #4
- Please describe the contribution of the paper
The paper proposes a few-shot medical image segmentation framework with
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The technical presentation and logic is good.
-
The quantitative and qualitative results are robust. The proposed method outperforms or achieves competitive performance with the baselines.
-
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The major concern comes from the novelty. The technical contributions are incremental. For example, cross attention mechanism, prototype learning in segmentation, and iterative refinement are not new concepts. My understanding is that the authors integrate these techniques efficiently for the task of interest. But I didn’t mean this integration is not acceptable.
The authors are encouraged to cite a few relevant papers e.g. [1] and highlight the contributions. [1] CrossViT:Cross-Attention Multi-Scale Vision Transformer for Image Classification, Chen et al., ICCV 2021
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
With more details, I think the paper can be properly reproduced.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
In addition to the main weaknesses in 6, I have the following questions or concerns.
-
In Sec. 2.3, the authors use a prior mask. I check both the main text and the supplementary, I cannot find a detailed description on this prior mask. How do the authors get it?
-
In Eq. (2), I would suggest using the full representation of A(Q, K) instead of A.
-
Right after Eq. (3). “We expend and flatten the binary query mask M^q to xxx”. This part is unclear to me. 2.1 expend -> expand 2.2 Eq. (4), I think there is missing “)” on the right.
-
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The major factor is the technical novelty. I am willing to upgrade my score dependent on the rebuttal.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #5
- Please describe the contribution of the paper
In this paper, the authors present a novel support&query few-shot medical image segmentation method where cross attention transformer is introduce to mine the support->query and query->support correlations. Experiments on three benchmark datasets prove the effectiveness of the method.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-written and well-motivated to use the cross attention transformer to mine the support->query and query->support correlations.
- Experiments are extensive with good results comparing to the baseline methods on three different datasets.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The CMAT block and the cascaded refinement stratgy can increase the computation burden. To make a fair comparison, the authors can report FLOPs and parameter numbers of the CAT-NET with other baseline methods.
- The arise of Segment Anything model (SAM) from FAIR has shown superior performance in tern of interactive segmentation. The authors can discuss and compare this work with SAM models.
- In term of the Prototypical Segmentation Module, the idea has been explored by a number of prior works, the authors can discuss how to integrate this work to further improve the performance. To list a few: Xu Z, et al. All-around real label supervision: Cyclic prototype consistency learning for semi-supervised medical image segmentation[J]. IEEE Journal of Biomedical and Health Informatics, 2022. Dong J, et al. Multi-Scale Prototype Constraints with Relation Aggregation for Semi-supervised Medical Image Segmentation[C]. IEEE BIBM, 2022. Pan W, et al. Human-machine Interactive Tissue Prototype Learning for Label-efficient Histopathology Image Segmentation[c]. IPMI, 2023.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Good. The authors are promised to make their codes available and experiments are done on public datasets.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
See Weakness.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The experiments are extensive with superior performance than baselines.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
The authors have addressed my previous concerns. I am happy to accept the paper.
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Authors have proposed a framework for few-shot medical image segmentation, CAT-Net, using cross masked attention Transformer. They propose to mine the correlation support and query images, which is then iteratively applied to continually refine the segmentation performance. Three public available datasets have been used to evaluate performance. Reviewers see some merit in the work but raise concerns around method novelty, clarity of description, justification of approach and experiments
Author Feedback
We appreciate the reviewers for their constructive comments and acknowledgment of our strengths, such as “effective approach” (R1), “novel network, well-described” (R3), “results are robust” (R4), and “well-written and well-motivated” (R5). We’d like to address the reviewers’ concerns and suggestions as follows.
Motivation & Novelty (R3, R4)
To the best of our knowledge, this is the first work to explore medical image few-shot segmentation (FSS) from the perspective of the mutual interaction between support and query features. We introduce a Cross Masked Attention Transformer (CMAT) module to capture feature correspondence and remove redundant backgrounds. This module, coupled with an iterative refinement strategy, improves segmentation mask accuracy. CMAT can also enhance existing FSS methods as a plug-and-play module, suggesting its potential to inspire future FSS research.
Method & Effectiveness
R1: Why exchange work? Why exchange the Q? With CMAT, we transfer knowledge between support and query image, extracting their relevant regions. Using segmentation mask further limits the region-of-interest, leading to more accurate segmentation results. We use query (Q) as an agent to exchange information with each other. Q and K calculate the attention matrix, and it further multiplies V for the final score.
R4: Describe prior mask. Sorry for the confusion. Prior mask is the segmentation mask of query image in MIFE. We will clarify it in final version.
R5: Discuss other methods. We will discuss these methods in the final version.
Experiments
R1: Missing comparison with “Few Shot …”. Apology but there exists misunderstanding. We have compared our method with recent SOTAs, including this work.
R1: Provide 5-shot or 10-shot experiments, and add experiments on other organs. We add 5-shot and 10-shot experiments compared with the second-best method on three datasets:
Abd-CT 5-shot Methods LK RK Spl. Liv. Avg. Q-Net 66.82 60.23 65.98 76.46 67.37 Ours 66.65 62.87 68.78 77.84 69.04 10-shot Q-Net 67.75 61.31 67.30 78.31 68.67 Ours 67.54 63.29 68.84 79.69 69.84
Abd-MRI 5-shot Q-Net 76.08 80.35 70.96 80.63 77.01 Ours 76.32 80.98 71.66 81.02 77.50 10-shot Q-Net 77.21 81.01 72.34 81.67 78.06 Ours 77.78 81.84 73.12 82.74 78.87
CMR-MRI 5-shot LV-B LV-M RV Avg. Q-Net 69.35 90.26 80.68 80.10 Ours 69.06 91.94 80.81 80.61 10-shot Q-Net 70.76 90.88 81.39 81.01 Ours 70.39 92.67 81.96 81.67
Our method outperforms Q-Net in all settings. Our experiments also include other organs, such as spleen and ventricle, in Table 1 and Fig. 2.
R3: Adding statistical significance. We add the P-values in final manuscript. Our method outperforms SOTAs with P-value < 0.05.
R3: Why the Dice plateaued after 4 iterations of the CMAT module? Through multiple iterations, the model captures the features’ correlation, leading to improved segmentation. Yet, performance plateaus after a certain point. We argue the optimal iteration number may differ across datasets and tasks.
R5: FLOPs and parameters. FLOPs Para. ALP-Net 91.48G 44.30M Q-Net 266.87G 46.12M Ours 185.16G 50.82M
The whole network is trained using single NVIDIA 3090 GPU with 12.8 GB memory. Training time is roughly 4 hours. Inference time is 1.44 sec/image. The increase in computation cost of our method is marginal.
R5: Compare with SAM models. We compare with MedSAM[*] on Abd-MRI:
LK RK Spl. Liv. Avg. MedSAM 68.59 73.61 67.14 76.87 71.55 Ours 74.01 78.90 68.83 78.98 75.18
Our method beats MedSAM on all organs. We see the potential for further refining larger models, and we’ll explore it in future research. [*] Segment Anything in Medical Images, arXiv, 2023.
Future Plan
R3: Clinical application. We’ll investigate more clinical applications, such as rare diseases and malformed organs, where data and annotations are scarce and costly.
Typos
We thank the reviewers for pointing out the typos. We will correct them in the final manuscript.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
I am happy with the rebuttal as it has addressed major concerns raised especially by R1, and other reviewers. Authors have done a good job with the rebuttal with 2 reviewers upgrading their scores.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The paper proposes a method for few-shot learning in CT and MRI images using attention. The reviewers raised several concerns about the novelty of the work and asked for some clarification regarding some of the terms. The authors provided a proper rebuttal and even (although not necessary) include results on additional experiments in their rebuttal. Two reviewers have raised their rating from 5 to 6, and 4 to 5, respectively. Based on these reviews and rebuttals, I’d suggest acceptance for this work.
As a small sidenote: I’d recommend the authors use the original names of the datasets that they have used (e.g., CHAOS) out of respect for the composers of those data sets.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This paper introduces a novel few-shot medical image segmentation framework based on cross masked attention transformer. Reviewers enquired about method novelty and justification, as well as further bolstering of experimental results. In my view, the major concern was one of technical novelty as the ideas of cross-attention, prototype learning and iterative refinement have been previously explored. The author feedback addresses this by highlighting the novelty in use of mutual information between support and query features.
Hence, my recommendation is to accept this paper. Would suggest that the authors update camera ready in accordance with all reviewer feedback, with particular focus on the motivation, related work and contribution sections of the paper.