Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hritam Basak, Sagnik Ghosal, Ram Sarkar

Abstract

Due to the imbalanced and limited data, semi-supervised medical image segmentation methods often fail to produce superior performance for some specific tailed classes. Inadequate training for those particular classes could introduce more noise to the generated pseudo labels, affecting overall learning. To alleviate this shortcoming and identify the under-performing classes, we propose maintaining a confidence array that records class-wise performance during training. A fuzzy fusion of these confidence scores is proposed to adaptively prioritize individual confidence metrics in every sample rather than traditional ensemble approaches, where a set of predefined fixed weights are assigned for all the test cases. Further, we introduce a robust class-wise sampling method and dynamic stabilization for better training strategy. Our proposed method considers all the under-performing classes with dynamic weighting and tries to remove most of the noises during training. Upon evaluation on two cardiac MRI datasets, ACDC and MMWHS, our proposed method shows effectiveness and generalizability and outperforms several state-of-the-art methods found in the literature.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_22

SharedIt: https://rdcu.be/cVRY4

Link to the code repository

N/A

Link to the dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html

http://www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mmwhs/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes an approach to improve the performance of semi-supervised learning for under-performing classes in imbalanced datasets. It proposes to record confidence indicators (entropy, variance, confidence) during training for every class. The indicators are combined using fuzzy fusion. A class sampling scheme uses the confidence score to sample classes. A dynamic training stabilization scheme is also proposed, which redistributes the losses from convincing and under-performing samples.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The ablation study demonstrated the value of RCS, fuzzy fusion, and DTS, when added to the baseline model. The proposed model was evaluated on two different datasets. Performance is not very sensitive to the values of hyperparameters beta and lambda.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper aims to improve the performance of the under-performing class. However, the results do not show the benefits for the under-performing class, as metrics such as DSC are used. There are no results/discussions of the effects on the under-performing/minority class. The performance improvement of the proposed method for various levels of class imbalance (imbalance factors) needs to be demonstrated. How effective is each component when the classes are extremely imbalanced / fairly balanced? The proposed method was not convincingly superior to SOTA on ACDC. Average DSC was lower for L=1.25% and 2.5%, and very similar to PCL and Global+Local CL at 10%. PCL results for MMWHS are missing. The performance improvements due to each of entropy, variance, and confidence are not clearly demonstrated. In the supplementary, the evaluation appears to use DTS in the model (judging from CC results). What is the performance of each indicator without DTS?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Satisfactory. Datasets are publicly available, and methods are described with acceptable detail.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper may consider the following to improve: Demonstrate effects on the sensitivity of each class (especially the minority and under-performing class). Demonstrate the value of the proposed method for various levels of class imbalance (imbalance factors). Further improve the proposed method to convincingly beat SOTA on ACDC. Include PCL results for MMWHS. The writing needs to be polished to improve readability and fix grammatical errors.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main weakness is that the paper aims to improve the performance of the under-performing class, but the results do not show the benefits for the under-performing class. There are no results/discussions of the effects on the under-performing/minority class.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The paper presents a novel approach to address the class imbalance problem in semi-supervised image segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is its novelty in algorithm formulation. The proposed training scheme includes several novel elements such as category-wise confidence scores, fuzzy adaptive fusion, class-wise resampling, and dynamic training stabilization.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main contribution of the paper is to address class imbalance during training. The application is, however, limited to the semi-supervised learning, and the evaluation is only performed on cardiac MRI datasets.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper presents enough details to reproduce the results on public datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The proposed method is novel and interesting. Can the method be used with fully supervised or self supervised learning? Can the method be applied to radiological images other than cardiac MRI?

    In Table 1, the PCL method also shows good results for L=2.5% and 10% in ACDC. But there is no result of PCL for the MMWHS dataset, can the authors explain why?

    On page 4, how the range for R_c^k was determined? How about the penalty values for P_c^CCF and P_c^FR?

    “ACDC” needs to be referenced first time in the text.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper focuses on a new training algorithm including a set of new mathematical formulation, and thus the methodological novelty is strong. The details of the experiments and parameter tuning is provided in the paper and in supplementary material. The method was well evaluated by comparing with other methods, and an ablation study was performed to show the effects of different components of the method. Finally, the result looks promising although the evaluation was only carried out on cardiac MRI datasets.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    The authors’ rebuttal addresses my questions well as well as other reviewers’ concerns. My overall rating is the same as previously, which is to accept the paper.



Review #3

  • Please describe the contribution of the paper

    The submission presents a training strategy for student-teacher network to deal with the class imbalance problem in cardiac MRI segmentation. By investigating class-wise confidence and class-wise sampling rate, authors improve commonly used cross entropy loss by focusing more on less confident classes, and they further reach better training stabilization by utilizing dynamic modulation of weights.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The application of cardic MRI segmentation, the semi-supervised algorithm and the proposed training strategy are tightly matched. (2) Most of the formulas are written clearly and detailed. (3) Experiments are fully conducted on hyperparameters, ablation studies, and method comparison. (4)Two public datasets are utilized to demonstrate generalizability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Too much formulas and notations which are hard to follow. (2) It’s unclear whether the improvement on under-performing clases will destroy the performance on other classes.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code sharing was mentioned in the submission. The datasets are publicly avaible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) It’s highly recommanded to make Figure1 more detailed to include all of the main notations and show all of the components of the proposed training strategy. (2) Please clarify in Equation (6) that how entropy, variance and confidence are fused. (3) Please add comparisons on each single anatomical object in ablation studies and/or method comparison to demonstrate that the proposed strategy won’t let those good-performing classes decay. (4) Please have a discussion on, after applying class-wise sampling rate, how choosing pixels on boundary, inside the object, or purely randomly will influnce the segmentation accuracy.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application, the algorithm and the proposed strategy are tightly matched. It can be better after clarifying some details.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    Careful and detailed responses in rebuttal.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This papers proposes a method for addressing the class imbalance problem in semi-supervised segmentation. The reviewers reached consensus on the high novelty of the work. However, reviewers also raised questions on the experimental validation, especially on lack of demonstration of effects on under-performing classes and other classes. Please give your feedback addressing these issues in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We would like to thank the AC and Reviewers. We present answers (A) to the major questions (Q) from Reviewers (R) below:

Q1:The results do not show effects on the underperforming classes (R1,R3,AC) A1:We record classwise DSC and sensitivity to observe the improvements for underperforming classes. Observed DSC and sensitivity are (0.934,0.883,0.877) and (0.964,0.925,0.911) for classes LV, RV, and MYO respectively using 10% labelled data of ACDC. Our method achieves an improvement of ≈2-5% for underperforming classes (RV, MYO) than the baseline and literature. Similar improvements are observed across the classes in other experimental settings, both for ACDC and MMWHS. We will update these results in the final manuscript.

Q2:Show performance improvements for various class imbalance ratios (R1) A2:We have shown good results on ACDC and MMWHS having imbalance ratios 1726.4±233.2 (high imbalance) and 452.1±89.2 (moderate imbalance) respectively. We have also experimented on skin lesion (high imbalance), brain MRI (low imbalance), and kidney (moderate imbalance) datasets with very promising outcomes. Due to the page limit in MICCAI, we plan to publish these findings in the extended version of this work in future.

Q3:Why is PCL result missing for MMWHS dataset? (R1,R2) A3:To compare our results with the literature, we reported some results directly from those papers, as it is not always possible to reproduce and experiment with them. The PCL paper reported end-to-end segmentation results for ACDC and CHD datasets only. So we could not include PCL results for MMWHS dataset.

Q4:What is the performance of each confidence indicator without DTS? (R1) A4:To know the contribution of the indicators, we record the model performance by using them individually with and without DTS. The best result among these is achieved by using the Confidence (C) indicator as shown in supplementary file. The performance drops by ≈1-2% for all the cases without using DTS, justifying its importance. We will modify Table3 in supplementary file to accommodate the additional results.

Q5:How to obtain range of R_c^k in Eq.6 and penalty values in Eq.7? (R2) A5:The Gompertz function R_c^k=[1-e^{-e^(-2norm(x_c^k))}] is a decreasing function, passing through (0,0.632) and (1,0.127). In our case, x_c^k is normalized in the range of (0,1). Putting these two extreme values in the function, we get upper and lower limits of R_c^k in Eq.6.

As described in the paper, if R_c^k does not belong to the top m ranks, we enforce its contribution towards the cumulative confidence score to be the lowest. norm(x_c^k) has the lowest value 0, and corresponding R_c^k value is 0.632. Thus, we determine the penalty values in Eq.7 to be 0 and 0.632 respectively.

Q6:How are the confidence indicators fused in Eq.6 and Eq.7? (R3) A6:We apologize for the confusion. We generate classwise fuzzy ranks R_c^k for the three performance indicators using the Gompertz function (Eq.6) where c is the class and k denotes three performance indicators. We select the top m ranks for each class. For each indicator, we generate the corresponding CCF_c and FR_c following Eq.7, where we perform summation over k. Finally, we obtain CC_c by multiplying FR_c and CCF_c (Eq.8).

Q7:How choosing pixels on the boundary, inside the object, or purely randomly will influence the performance? (R3) A7:This is an interesting query. Boundary regions, containing high-level semantic information shared between two adjacent classes, are difficult to segment accurately. Sampling the majority of pixels from such regions aids the model to learn more context information about the confusing regions and may lead to improved performance. Sampling pixels only from inside the object may restrict the model to learn this feature. Our proposed method samples pixels randomly, aiding the model to learn both the global context of the objects and boundary information. We will experiment extensively with these sampling strategies in future.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors provided reasonable explanation on the key issues of experimental validation. The rebuttal also addressed some of the main clarification questions raised by the reviewers. The reviewers are satisfied after rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The major issue in this paper was the lack of results on the under-performing classes, as noted by a majority of reviewers. It was critical for a work aiming at improving indeed the segmentation accuracy for these classes in particular.

    This question, among others, was addressed in the rebuttal. Hence, thanks to the reviewer’s suggestions, I believe that the paper has now been improved and is now ready to be accepted at MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After carefully reading reviews and authors’ rebuttal, I think that the authors have done a good job in their responses, addressing most important comments. Despite their extensive experiments, however, I would have appreciated a deeper analysis with other losses designed to tackle class imbalance (such as focal loss). Having said this, I believe the idea of tackling class-imbalance in semi-supervised segmentation is novel and can be of benefit to the community. Thus, I recommend acceptance of this work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top