Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Haonan Wang, Xiaomeng Li

Abstract

The volume-wise labeling of 3D medical images is expertisedemanded and time-consuming; hence semi-supervised learning (SSL) is highly desirable for training with limited labeled data. Imbalanced class distribution is a severe problem that bottlenecks the real-world application of these methods but was not addressed much. Aiming to solve this issue, we present a novel Dual-debiased Heterogeneous Cotraining (DHC) framework for semi-supervised 3D medical image segmentation. Specifically, we propose two loss weighting strategies, namely Distribution-aware Debiased Weighting (DistDW) and Difficulty-aware Debiased Weighting (DiffDW), which leverage the pseudo labels dynamically to guide the model to solve data and learning biases. The framework improves significantly by co-training these two diverse and accurate sub-models. We also introduce more representative benchmarks for class-imbalanced semi-supervised medical image segmentation, which can fully demonstrate the efficacy of the class-imbalance designs. Experiments show that our proposed framework brings significant improvements by using pseudo labels for debiasing and alleviating the class imbalance problem. More importantly, our method outperforms the state-of-the-art SSL methods, demonstrating the potential of our framework for the more challenging SSL setting. Code and models are available at: https://github.com/xmed-lab/DHC

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_56

SharedIt: https://rdcu.be/dnwBQ

Link to the code repository

https://github.com/xmed-lab/DHC

Link to the dataset(s)

https://www.synapse.org/#!Synapse:syn3193805

https://amos22.grand-challenge.org/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper improves the CPS approach by using two re-weighting strategies in the loss. The model is trained to focus more on the hard cases and minority classes gradually during training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed reweighting strategies are straightforward and easy to implement.

    2. The improvement over other semi-supervised methods is prominent.

    3. They provide concrete evidence (Fig. 4) to show more insights of the weights during training.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The method is only evaluated on Synapse. How does this method compared with prior works on other datasets that are used in previous papers?

    2. DistDW uses the pseudo label of unlabled data to adjust the weight for each category. Sometimes, it doesn’t make sense because the model might already overestimate the masks for small organs but the overall number of voxels is still small. Since we already have ground truth of labeled data, why don’t we use it to set the weight directly?

    3. The last four rows in Table2 are not explained in the main paper.

    4. Some details are missing – what’s the architecture of the two networks, how to initialize them, why ce loss and dice loss are used for unlabeled data but only ce loss is used for labeled data.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This method is easy to reproduce based on the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. It would be better if the authors can explain why a new benchmark is necessary for semi-supervised segmentation. I think the imbalance of categories also exists in datasets used in previous papers.

    2. Can you respond to my concern about the design of DistDW in weakness?

    3. Please explain the last four rows in Table2.

    4. More details (see weaknesses) are needed.

    I’m happy to change my score if these concerns can be cleared.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a reasonable method and results in notable improvement with different percentages of labeled data. Overall this is a good work but they still have some technical weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents a deep learning based method for semi-supervised medica image segmentation, which faces the challenge due to imbalanced class distribution under the condition of limited label. The challenge lead to data bias and learning bias, which is addresses by the proposed DistDW and DiffDW strategies. The experiments on Synapse with 13 foreground classes are included to illustrate the effectiveness of the proposed algorithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper points out the critical issue associated with imbalanced class distribution in medical image segmentation and proposes sensible strategies to address the challenge due to this issue.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The experiments are based on only one dataset of organ segmentation, not enough to substantiate the claim for medical image segmentation.

    It seems that the paper uses CPS baseline as the network. But, it is not clear about the input and output of this network.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Our code and models will be released upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    It is a bit stretchy to claim this work as a general medical image segmentation method as the experiment is based only on one dataset. It will be ideal to add at lease one more dataset.

    The abbreviation CPS appears without a full name.

    dice -> Dice (in this context, it means scientist Dice)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper introduces an overlooked yet critical issue in semi-supervised medical image segmentation and presents strategies for addressing this. The experimental part lacks some details and also needs at least one more dataset to support the claim.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present a novel Dual-debiased Heterogeneous Co-training (DHC) framework for semi-supervised 3D medical image segmentation to deal with imbalanced class distribution problems. Specifically, two loss weighting strategies, Distribution-aware Debiased Weighting (DistDW) and Difficulty-aware Debiased Weighting (DiffDW) were proposed to solve data and learning biases. The proposed method outperformed the state-of-the-art SSL methods, showing the potential of the proposed framework for the more challenging SSL setting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors present a novel Dual-debiased Heterogeneous Co-training (DHC) framework for semi-supervised 3D medical image segmentation to deal with imbalanced class distribution problems. Specifically, two loss weighting strategies, Distribution-aware Debiased Weighting (DistDW) and Difficulty-aware Debiased Weighting (DiffDW) were proposed to solve data and learning biases. The proposed method outperformed the state-of-the-art SSL methods, showing the potential of the proposed framework for the more challenging SSL setting. In general, the imbalance in semi-supervised learning is an inherent problem, but the lack of attention. So, the core idea is interesting and comes to the practical.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although the core idea is interesting, the method is too complex, and the method section organization is still bad to read and understand. In addition, the motivation is not clear, I don’t know why or how the imbalance problem affects the results. In Table 1, why not use the focal loss for all general SSL methods? And, the supervised performance of VNet is too bad, so I don’t think the baseline results are reasonable and convincing. In my understanding, there are too many large-scale multi-class imbalance datasets (FLAIR, WORD, AMOS) why not show the performance on this dataset, the BTCV just has 30 volumes, it’s too small to demonstrate the author’s claim. Finally, the authors claimed their method achieved significant improvements over others, but there are not any statistical results to support this, I don’t agree with it.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Too complex to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See the weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The weaknesses.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal, I would like to promote the score from reject to weak accept, the authors addressed some major concerns there are still some minor concerns that need to be revised in the final revision. Thanks for the author’s feedback and efforts.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a method for semi-supervised medical image segmentation by incorporating two re-weighting strategies in the loss. The strengths of the paper lie in the straightforward implementation of the reweighting strategies, the prominent improvement over other semi-supervised methods, and the concrete evidence provided to understand the weights during training. The core idea of addressing imbalanced class distribution in semi-supervised learning is interesting and practical. However, the weaknesses include the evaluation on only one dataset, the rationale behind using pseudo labels for weight adjustment, unexplained rows in the result table, and missing details in the methodology section. The clarity and organization of the paper are rated as good, and reproducibility is feasible based on the provided information. Rebuttal should include clearer explanation and results on additional datasets.




Author Feedback

We thank the meta-reviewer and reviewers for their time, valuable feedback, and recommendations for improvements. Overall, reviewers highly appreciated the fact that our paper addresses an overlooked yet critical problem (R2). They found our method to be interesting, practical, reasonable, and straightforward (R1, R3). R1 also acknowledged the significant performance improvements achieved by our method. The main concerns include experiments on an additional dataset (R2, R3), the rationale behind our DistDW (R1), unexplained rows in Table 2 (R1), and method details (R1, R2). R3 expressed concern that our method is too complex. However, as appreciated by R1, our method is easy to understand and effective. We have made our code available anonymously at: anonymous.4open.science/r/DHC. This codebase contains all the compared methods, making it convenient for further research and beneficial for the community.

Additional datasets (R1, R2&R3). We have conducted new experiments on a large-scale dataset, AMOS, comprising 360 abdominal CT and MR images with annotations of 15 organs (duodenum, bladder, prostate/uterus, etc.). We divided the dataset into 216, 24, and 120 volumes for training, validation, and testing, respectively. The results obtained by using 2%, 5%, and 10% labeled data are as follows: V-Net (fully supervised) | 76.5% ——————- | 2% | 5% | 10% CPS (baseline) | 31.78% | 41.08% | 54.51% CLD (MICCAI’22) | 36.23% | 46.10% | 61.55% DHC (ours) | 38.28% | 49.53% | 64.16%

Why use pseudo labels for weight adjustment (R1)? The prior SOTA, CLD (Lin et al. MICCAI 2022), solely relied on the ground truth of the labeled data for class-wise weighting, which led to limited performance due to that scarce labels could not fully represent the holistic data distribution. In our paper, we discovered that incorporating pseudo labels enabled us to capture the holistic learning effect for both labeled and unlabeled data, thereby improving SSL. We also tested only using the ground truth of labeled data for DistDW, but it showed inferior results (4% Dice drop).

Unexplained rows in Table 2 (R1). The last four rows are the combinations of our proposed module, e.g., “DistDW”, and the existing class-imbalance methods, e.g., “CReST”. These results aim to verify the importance of heterogeneity in solving the over-fitting problem of CPS and serve as detailed results of the 3rd and 4th columns in Fig.2.

The necessity of the class imbalance issue for SSL (R1&R3). The LASeg dataset, which consists of a single foreground class, serves as the predominant benchmark for evaluating SSL methods in medical image segmentation. However, as demonstrated by the results in Table 1, existing methods face challenges when applied to more realistic multi-class segmentation scenarios. Therefore, there is a pressing need for a more challenging benchmark that accurately assesses the performance of SSL methods in real-world applications.

Concerns on “bad performance of our baseline” (R3). We clarify that our baseline is better than V-Net reported by previous work, e.g., TransUNet (Arxiv’21), on the 8 classes. The overall Dive values in these papers are higher since they omitted the minority classes.

Imbalanced issues can’t be solved by simply using focal loss (R3). We also add new experiments on the 20% labeled Synapse dataset to show that the imbalance issue cannot be solved by using focal loss, i.e., Dice values of the minority classes (ES, RAG, LAG) are still zeros. Methods |Avg Dice| … |ES|RAG|LAG DePL | 36.23% | … |0.0| 0.0 | 0.0 DePL w/ focal loss | 37.41% | … |0.0| 0.0 | 0.0 DHC | 46.06% | … |10.5|31.2|10.9

Statistical test to verify significance (R3). We further apply Student’s T-Test with alpha=0.05 to verify the significance of improvement. The p-values of DHC over CPS and CLD on the 20% labeled Synapse dataset are 0.00077 and 0.0147, which verifies the improvements are significant.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal seems to well address the aforementioned concerns and receive a consensus recommendation of accept.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes to tackle class-imbalance in semi-supervised segmentation with a distribution-aware and difficulty-aware weighted pseudo-labeling. The reviews appreciate the sensible strategy on the often overlooked problem of class imbalance, but have major concerns on a limited validation with one dataset, complexity of the method, readability issues, unconvincing baseline results, and questions on key validation choices.

    While the rebuttal should address choices made during the submission, providing new results on a new dataset after the submission remains unfair to the other submissions that were complete on time. Our instructions requires to not consider new experiments. The submission, despite tackling a valid problem, remains therefore insufficient to miccai standards.

    For these reasons, and situating the work with respect to the other completed submissions, the recommendation is towards Rejection.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper is on the fence, with the three reviewers weakly recommending its acceptance. After reading the reviews and rebuttal from the authors, I think that authors did a good job addressing the raised concerns. In particular, authors included an additional dataset, which better showcases the generability of the proposed method (even though I would strongly recommend to evaluate this and other approaches in a more extensive list of datasets). Furthermore, authors positively answer the concern related to the use of pseudo-labels, as well as the motivation of considering class imbalance in semi-supervised segmentation, for which I side with the authors in that it is an important problem. Considering all this, and despite there is no much enthusiasm from the reviewers, I recommend the acceptance of this work, and strongly recommend the authors to consider all this positive feedback and criticism to improve the camera-ready version.



back to top