Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuchen Yuan, Xi Wang, Xikai Yang, Ruijiang Li, Pheng-Ann Heng

Abstract

Despite great progress in semi-supervised learning (SSL) that leverages unlabeled data to improve the performance over fully supervised models, existing SSL approaches still fail to exhibit good results when faced with a severe class imbalance problem in medical image segmentation. In this work, we propose a novel Mean-teacher based class imbalanced learning framework for cardiac magnetic resonance imaging (MRI) segmentation, which can effectively conquer the problems of class imbalance and limited labeled data simultaneously. Specifically, in parallel to the traditional linear-based classifier, we additionally train a prototype-based classifier that makes dense predictions by matching test samples with a set of prototypes. The prototypes are iteratively updated by in-class features encoded in the entire sample set, which can better guide the model training by alleviating the class-wise bias exhibited in each individual sample. To reduce the noises in the pseudo labels, we propose a cascaded refining strategy by utilizing two multi-level tree filters that are built upon pairwise pixel similarity in terms of intensity values and semantic features. With the assistance of these affinities, soft pseudo labels are properly refined on-the-fly. Upon evaluation on ACDC and MMWHS, two cardiac MRI datasets with prominent class imbalance problem, the proposed method demonstrates the superiority compared to several state-of-the-art methods, especially in the case where few annotations are available.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_44

SharedIt: https://rdcu.be/dnwDS

Link to the code repository

https://github.com/IsYuchenYuan/SSCI

Link to the dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html

https://zmiclab.github.io/zxh/0/mmwhs/


Reviews

Review #2

  • Please describe the contribution of the paper

    The paper proposes a method for semi supervised learning on imbalanced data for cardiac MRI segmentation. The method propose training a mean-teacher based framework for this problem by introducing a prototype classifier and a pseudo-label refiner. This way, the method exploits the potential of prototype classifier to deal with class imbalance. The paper contains experiments on two cardiac datasets and present ablation studies showing the contribution of each term. The results show significant improvement over the existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Semi supervised learning for class imbalance is an under-explored but important problem. The paper’s attempt to solve this problem is an important contribution to improve the performance of semi-supervised learning methods.
    • Using prototype clasifiers to deal with class imbalance is an interesting idea and the results show that it is working well.
    • The experiments on two different datasets are satisfactory and show the potential of the proposed method. Also, the comparisons with the existing methods are sufficient, expect strong baselines from class-imbalance literature.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I think the main weakness of the method is the lack of comparisons with the supervised losses that are designed for class imbalance such as [1, 2, 3]. Many of these methods can be directly used to replace the supervised loss in the semi-supervised setting. In the current setting, there is no method that was proposed to deal with for class imbalance. A simple baseline to compare would be designing experiments where the supervised losses in these methods are replaced with a supervised loss designed for class imbalance. I think comparisons with such a strong baseline is crucial for this work.

    [1] Li et al, Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation [2] Lin et al. Focal Loss for Dense Object Detection [3] Sangalli et al. Constrained Optimization to Train Neural Networks on Critical and Under-Represented Classes

    • The description of the pseudo-label refinement module is difficult to understand. I didn’t really understand every steps of this module. This section should be definitely improve for the clarity.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are some missing details that make reproducing the results difficult. Especially, I didn’t quite understand the pseudo-label refinement step. Also, the authors didn’t promise to publish their code upon acceptance. I strongly suggest sharing the code since implementing the method from the description in the paper is a bit difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I mostly have positive opinions about the paper. The problem of class imbalance in the semi-supervised learning setting is very common, however, it received limited attention so far. The idea of using prototype classified to deal with this issue is quite interesting and seems to be working well.

    However, I am concerned because the comparisons with the existing semi-supervied class-imbalanced based methods are missing. I understand that there may not be much to compare with in this literature. However, there are lots of papers that propose loss functions to deal with class imbalance in supervised setting. I think, the existing semi-supervised methdos can simply be extended by replacing the current supervised losses with one of the losses proposed for class imbalance. This would create a strong baseline for the existing literature and better shows the improvement achieved by the proposed method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think lack of comparisons with a stong semi-supervised method for class imbalance is a crucial shortcoming of the paper. Although, I think that the paper is quite well-writtten, the ideas are interesting and the results are promising, this weakness is crucial and hindering me suggesting accept for this paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    Thanks to the authors for addressing reviewers’ concerns during the rebuttal. I think that the proposed idea is interesting, the paper is well-written and contains quite detailed experiments expect comparison with stronger baselines I mentioned in my review. Authors kindly provided additional results by adding asymmetric focal loss to self-training and mismatch, and showed that the proposed method still performs better despite the improvement achieved by AFL.

    I think these experiments are quite valuable. However, the paper requires a more in-depth comparison with the methods from the class-imbalance literature. Also, incorporating these results to the paper requires another major revision round, which is not possible for this conference. Therefore, I keep my initial rating.



Review #3

  • Please describe the contribution of the paper

    This paper addresses the problem of imbalanced classes in semi-supervised cardiac MRI segmentation. The authors adopt a teacher-student framework for semi-supervised learning. The training of a prototype-based classifier, as well as the use of multi-level tree filters are new in this framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Novel elements are proposed within the teacher-student framework for semi-supervised learning including prototype-based classifiers and multilevel tree filters. (2) Experiments are fully conducted on hyperparameters, ablation studies, and method comparison. (3) The proposed model was evaluated on two public datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The individual performance metrics for the three classes for the ACDC dataset should be given. (2) Clinical metrics for the ACDC dataset on ejection fraction, left and right ventricles volumes, myocardium mass at ED and ES separately, with comparison to other methods would be useful for assessing the performance of the proposed algorithm.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good. Datasets are publicly available, and methods are described with acceptable detail.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The comments on the weaknesses could be considered here.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is novel and the subject very interesting, however the per class performance is needed, as well as the evaluation of clinical metrics.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have adequately answered the questions I raised, and in my opinion those of the other reviewers. I have therefore raised the score by one point.



Review #5

  • Please describe the contribution of the paper

    This paper proposed a semi-supervised learning framework in medical image segmentation, which is characterized by scarcity of annotations and class imbalance. The authors proposed a novel prototype-based classifier and multi-level tree filters to tackle these problems.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors proposed a novel method named Prototype-based classifier to address the problem of class imbalance in medical image segmentation.
    • The authors proposed a novel multi-level tree based method for pseudo-label refinement in SSL.
    • Empirical results demonstrate the efficacy of the proposed PC classifier and multi-level TF.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The high-level TF F_high was shown to be able to save failures of F_low, but the benefit of F_low is not clearly discussed.
    • The inference time is not discussed in this work. Hybrid networks, constructing pixel-wise MST and performing TF are probably time-consuming.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It might be hard to reproduce as the code would not be available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • It might be interesting to list the time cost of constructing the pixel-wise graph and performing the tree filtering.
    • Listing the results of using high-level TF only would be helpful to demonstrate the advantage of multi-level filtering.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors proposed novel strategies for alleviating the scarcity of annotation and class imbalance in medical image segmentation, the paper was clearly presented and all claims were supported by the results. Making the implementation open-source would make the work more reproducible.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This submission proposes to mitigate the class imbalance problem in semi-supervised segmentation with a mean-teacher approach. The originality resides in leveraging a prototype-based classifier that iteratively guide the training process, and a cascaded strategy reduces noises in internal pseudo-labels. The evaluation is on two cardiac MRI datasets. The review have overall positive comments on the relevance of handling class imbalance in semi-supersion, but the scores remain mixed, ranging from weak rejection to acceptance. The authors are therefore invited to address the following main concern in a rebuttal:

    • Comparison with class-imbalance literature - The contribution focuses on addressing the class imbalance problem in semi-supervision (R1). A discussion on the omission of a comparison with literature directly handling class imbalance will help clarifying doubts on the evaluation choices.




Author Feedback

We thank all reviewers for their appreciation on our methodological contributions, and constructive comments for further improvement.

  1. Comparison with class-imbalance literature (MR&R2): Some existing methods in Table 1 have considered the class imbalance issue. E.g., Class-wise Sampling [11] improved segmentation performance of tail classes by using dynamic weighting based on category-wise confidence scores. Data Aug [24] and Global+Local CL [26] used the re-weighting loss (i.e., class-wise weighted cross entropy). Following the suggestion of the R2, we extend the remaining two SSL methods (i.e., Self train [7] and Mixmatch [25]) by replacing the supervised losses with Asymmetric Focal Loss (AFL) [a], which was verified better than Focal Loss for addressing the class imbalance issue. When equipped with AFL, [7] and [25] achieved the Avg DSC of 0.533/0.670/0.804 and 0.579/0.701/0.804 using 10%/20%/40% labeled MMWHS training data, and 0.721/0.853/0.880 and 0.691/0.818/0.859 using 1.25%/2.5%/10% labeled ACDC training data. Although the performance was improved with the use of AFL, it is still worse than ours. Furthermore, we compare with another SOTA method [b] handling class imbalance in SSL. It achieves the Avg DSC of 0.561/0.682/0.787 using 10%/20%/40% labeled MMWHS training data, and 0.746/0.830/0.853 using 1.25%/2.5%/10% labeled ACDC training data, which is also inferior to our method. [a] Li et al. Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation. [b] Lin et al. Calibrating label distribution for class-imbalanced barely-supervised knee segmentation.
  2. Clarify the refinement module (R2): We have polished this module with more details in the revision.
  3. Give the per class performance for the ACDC dataset (R3): The Dice of our method on RV/MYO/LV are 0.898/0.874/0.949 using 10% labeled data.
  4. Evaluation of clinical metrics for the ACDC dataset (R3): The correlation coefficient of left ventricles volumes at ED\ES, right ventricles volumes at ED\ES, myocardium mass at ED, myocardium volume at ES, left\right ventricles’ ejection fraction of our method and Class-wise Sampling [11] using 10% labeled data (due to the rebuttal length limitation, we only report the comparison results of the newest method here) are 0.981\0.987\0.889\0.840\0.960\0.970\0.983\0.830 and 0.887\0.904\0.844\0.805\0.944\0.920\0.854\0.734, respectively, demonstrating the superiority of our method on the clinical metrics.
  5. Discuss the benefit of low-level TF (R5): The low-level TF shows its efficacy in performing fine-grained boundary refinement as it encodes accurate intensity gradient information. Instead, adopting only the high-level TF results in coarse boundary in segmentation due to the absence of low-level details (e.g., Fig. 1 (b) in the supplementary material). Quantitatively, using multi-level TFs achieves better performance than a single high-level TF (Avg DSC: 0.840 vs. 0.815) on 40% labeled MMWHS data, demonstrating its indispensability.
  6. Inference time (R5): Please note that constructing MSTs and performing TF are not needed during inference (0.35s time cost in the training phase), and the segmentation results can be directly produced by the trained H-UNet. The inference time (seconds) for a single MMWHS MRI volume of comparison methods [7,24,25,26,11] are 20.29, 28.36, 20.29, 22.40, and 69.95 respectively. Although our method has a longer inference time (59.48s) due to the hybrid design, it is still acceptable with far better performance than the others.
  7. Reproducibility (R2&R5): Our code will be released at https://github.com/IsYuchenYuan/SSCI.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has provided further results as confidence on situating the work with related work. The rebuttal should however focus on clarifying the choices made at submission and avoid providing new extra post-submission results. This is the main weakness of the submission and rebuttal. The scientific merit of using a prototype-based classifier remains positive, but the comparison with sufficient class-imbalance literature should be strenghtened in a future submission.

    For all these reasons and situating this work with respect to the other submissions, the recommendation is towards Rejection.

    The final decision will be a consensus with the other co-meta-reviews.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers agreed on the merit of the paper, although R1 raised concerns about insufficient experiments. The authors have made an effort in the rebuttal to include more literature on class imbalance. I personally find comparison can never be sufficient, especially for a conference publication. The authors are however encouraged to extend the evaluation and comparison to SOTA if more space is allowed in journal publications.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This submission received mixed scores during the review process, with the major concern related to the lack of experiments to related works on class-imbalance. While adding new experiments in the rebuttal is not mandatory and authors should not be penalized by the lack of these, I believe that the empirical validation of this work is weak and unconvincing. In particular, authors address an important problem (semi-supervised class-imbalance segmentation) for which the literature in both semi-supervised and class-imbalance is abundant. Nevertheless, the selected setting does not demonstrate the benefits of the proposed approach. In particular, I do not consider the ACDC dataset as an unbalanced dataset at all, and an evidence of this is the large number of standard semi-supervised segmentation methods that are evaluated on this popular benchmark, which outperform the reported results by this approach (the same applies to MMWHS). Authors should have included the performance of semi-supervised methods to motivate their arguments, as having experience on these datasets make me reluctant about the reported results. Furthermore, the choice of actual imbalance datasets (e.g., brain lesions) and comparisons to some of the large body of literature on imbalance segmentation would have strengthened this work. Thus, based on these comments I recommend the rejection of this paper, as adding all these changes would require a major revision.



back to top