Authors

Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jas Sekhon, James S. Duncan

Abstract

Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes, i.e., boundary regions or rare objects. Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic temperature via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_19

SharedIt: https://rdcu.be/dnwC3

Link to the code repository

https://github.com/charlesyou999648/ACTION

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a solution for the heavy class imbalance problem in medical imaging datasets with limited annotation by introducing a contrastive learning framework with adaptive anatomical contrast. They compute optimal class center locations and match class features to these centers to achieve uniformly distributed class centers. They also use a dynamic temperature coefficient with a cosine schedule to improve separation between majority and minority classes. The authors evaluated the proposed method on two public datasets and demonstrated improvements over compared methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Overall, the article is well-written and easy to follow. However there were a few portions of the methods section that were not clear.
2. The relevant works section is well done.
3. The experiments section is clear and most of the required details are provided.
4. The evaluation on two cardiac datasets with different distribution of labels count (different labels distribution) is nice for the readers to understand the benefits of the proposed method.
5. Authors perform plenty of ablation experiments to demonstrate effectiveness of each loss component or hyper-parameter.
6. There are enough comparisons provided from the literature.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Regarding the contribution of this work over previous cited work[28]. There are some details mentioned but it is not clear from current text what is the additional contribution wrt to methods (or what exact additional loss terms introduced) of this work over [28]. So, can the authors please be more precise on what exactly was done in [28] and what are the additional contributions of this work?
2. The current text lacks clarity on the training process, specifically which loss is trained at each stage and which parameters of the network are updated. Therefore, the authors are requested to clarify and provide more details on the steps of the training process, as well as which losses and corresponding parameters are learned at each step according to the diagram as presented in Figure 2.
3. In methods section: a. The local pixel loss objective is not clearly defined on page 4. Can the authors provide more details and clearly state if they are matching pixel level features using only the labeled data mask information or do they also consider unlabeled data to define this loss by using some proxy ground truths like pseudo-labels. b. Additionally, L_{unsup} at the end of page 4 is not defined. Can the authors please provide this definition? c. it is not clear where the loss defined on page 5 as L_{aaco} is used at which stage and where does it fit in Fig. 2. Can the authors please include these loss definition symbols into Fig. 2 to make it easier for the readers to follow?
4. In the introduction, on page 2, the authors mention group and pixel level discrimination in the paragraph before describing their contribution. Can the authors please clarify what is the difference between them? What do they mean by group in this context - is it image level information/label or aggregated label across multiple pixels instead of 1 pixel?
5. In figure 2: a. There is a typo in the “global pre-training step” sub-figure, the loss defined is “unsupervised local instance discrimination”. it should have been “global” instead. Can the authors please fix this? b. In the last sub-figure, can the authors provide the symbols (like L_{anco}/L_{acco}/L_{unsup}, etc) used to define the losses (ex: Eq. 2, 5) and a brief description of these symbols in captions?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors have provided enough details in the paper. It would be great if they can release the code after publication.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
I have provided the comments on the weakness/suggested improvements for the article in above section 6.

Some additional comments are mentioned below:
1. It would also be great to see a t-SNE or similar plot of previous work against proposed work to see if the clusters of majority and minority class features indeed improve with the proposed work for both the datasets evaluated?
2. For evaluation, it would have been better to use a dataset with more foreground classes (like ACDC) instead of a LA dataset used here that has only 1 foreground class to see the benefits of the proposed method between different distributions of foreground classes. The authors can consider in future revisions to include a dataset (like Abdominal CT datasets with multiple organ segmentations, brain MR datasets) with more foreground labels (>3).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The presentation of the methods section and its details lack clarity. Moreover, the precise contributions of this work over previous work [28] are not adequately presented.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper propose two improved modules for the existed work ACTION, which are the Supervised AdaptivezAnatomical Contrast and Anatomical-aware Temperature Scheduler to address the previous problem in unsupervised and semi-supervised constrastive learning in long tail medical segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The motivation is clear when the authors point out the lack of supervised CL on addressing the long-tail medical segmentation; and the drawbacks of the constant temperature parameter.
- The method is reasonable regarding solving the two above problems.
- The experiments are well conducted with the ablation on the dynamic temperature.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The contribution is quite marginal with incremental modules.
- The temperature scheduler is not well explain. I find it unclear the intuition of why the author schedule the temperature the way they do in section 2.3.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think the paper is reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

I think the author should address my concern on the intuition of the temperature scheduler.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The motivations are clear and the method is reasonable
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors propose an improved contrastive learning framework with adaptive anatomical contrast (ACTION++) for semi-supervised medical segmentation. It consists of 1) an adaptive supervised contrastive loss via encouraging different class features to adaptively match these distinct and uniformly distributed class centers and 2) a dynamic $tau$ via a simple cosine schedule in the contrastive loss on 2 public datasets, ACDC and LA, demonstrating that ACTION++ consistently outperforms current state-of-the-art SSL methods. We theoretically analyze the effectiveness of ACTION++.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-written and easy to follow. The paper clearly demonstrates the workflow and the novelty is clarified in the paragraph Method. The paper demonstrates the extensive experimental results on 2 public datasets. The authors provide a clear theoretical analysis of the proposed approach.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The proposed method is very interesting. There is no obvious weakness of the proposed work.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of the paper is sufficient.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The proposed method is interesting and the paper is well organized.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

8
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper has a very clear presentation and logic. The proposed method clearly shows the superiority of the proposed method from empirical and theoretical views. The experiments are solid. I would like to highly recommend to accept this paper as oral and nominating this work.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers are positive on the paper’s quality. Congratulations!

Author Feedback

To R1:Thanks!

contribution: [28] proposed the SOTA CL-based method for medical image segmentation. Inspired by [28], we improve its training framework with two key techniques: (1) supervised contrastive loss, which guides the model to yield well-separated and uniformly-distributed features for both the head and tail classes; (2) temperature scheduler, which allows the model to learn both class-wise and instance-wise features. Experiments show the effectiveness of our proposed method, and theoretical analysis confirms the superiority in label efficiency. We will follow your advice to revise our paper.

clarify the training process: Thank you for the advice. We had to omit some relatively insignificant detail due to the extremely limited space. We will try our best to add more detail in the revised version. We will also release our code, which we believe will provide all the necessary details to the reader.

local pixel loss: The local loss is KL-divergence loss between the similarity of student-generated logits and teacher-generated logits, with respect to anchor images. The loss is unsupervised loss, since no label is needed in the KL-divergence loss. Due to the space limit, please refer to [28], section 2 Local Contrastive Distillation Pre Training. The super_loss in stage 2 of Fig. 2 is a separate loss L_sup.

L_{unsup}: L_{unsup} is the cross-entropy loss on the unlabeled data using pseudo-labels of high-confidence pixels generated by the teacher model. This is also illustrated in the 3rd stage in Fig.2. Please refer to [28].

L_{anco}: The loss L_{anco} is used during the anatomical contrast fine-tuning stage, which fits into the 3rd stage in Fig. 2. Thanks for the great advice! We will further clarify according to your suggestion in the paper revision.

group and pixel level discrimination: A group refers to a cluster in the feature space. A cluster consists of multiple (i.e. a group of) feature vectors that are anatomically similar to each other. Note that one pixel corresponds to one feature vector, and therefore a group can equivalently refer to a group of pixels. Group-level discrimination means distinguishing between different groups in the space of features. Pixel-level discrimination means distinguishing between each pixel within each group.

Figure 2: We will follow your advice to revise our final version.

t-SNE: We will try to include in our final version.

dataset with more foreground labels: We will try to include in our final version.

To R2:Thank you very much for your recognition!

To R3:Thanks!

contribution: [28] proposed the SOTA CL-based method for medical image segmentation. Inspired by [28], we improve its training framework with two key techniques: (1) supervised contrastive loss, which guides the model to yield well-separated and uniformly-distributed features for both the head and tail classes; (2) temperature scheduler, which allows the model to learn both class-wise and instance-wise features. Experiments show the effectiveness of our proposed method, and theoretical analysis confirms the superiority in label efficiency. We will follow your advice to revise our final version.

temperature scheduler The intuition behind the temperature scheduler is to learn good group-wise features using a large temperature, as well as good pixel-wise features using a small temperature. This requires both the large and small values to be covered by the temperature parameter in training. Therefore, we need the temperature scheduler in a way that the temperature can change from tau^+ to tau^- during the process where the iteration t goes from 1 to T. To this end, we can simply use a cosine function which is periodic (see Table 6). By our design, when t=0, the temperature tau is equal to tau^+; when t=T, the temperature is equal to tau^-. Thank you for the insightful question. We will further clarify in our revision.

back to top

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast