Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qingyue Wei, Lequan Yu, Xianhang Li, Wei Shao, Cihang Xie, Lei Xing, Yuyin Zhou

Abstract

Medical imaging has witnessed remarkable progress but usually requires a large amount of high-quality annotated data which is time-consuming and costly to obtain. To alleviate this burden, semi- supervised learning has garnered attention as a potential solution. In this paper, we present Meta-Learning for Bootstrapping Medical Image Segmentation (MLB-Seg), a novel method for tackling the challenge of semi-supervised medical image segmentation. Specifically, our approach first involves training a segmentation model on a small set of clean labeled images to generate initial labels for unlabeled data. To further optimize this bootstrapping process, we introduce a per-pixel weight mapping system that dynamically assigns weights to both the initialized labels and the model’s own predictions. These weights are determined using a meta-process that prioritizes pixels with loss gradient directions closer to those of clean data, which is based on a small set of precisely annotated images. To facilitate the meta-learning process, we additionally introduce a consistency-based Pseudo Label Enhancement (PLE) scheme that improves the quality of the model’s own predictions by ensembling predictions from various augmented versions of the same input. In order to improve the quality of the weight maps obtained through multiple augmentations of a single input, we introduce a mean teacher into the PLE scheme. This method helps to reduce noise in the weight maps and stabilize its generation process. Our extensive experimental results on public atrial and prostate segmentation datasets demonstrate that our proposed method achieves state-of-the-art results under semi-supervision. Our code is available at https://github.com/aijinrjinr/MLB-Seg.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_18

SharedIt: https://rdcu.be/dnwC2

Link to the code repository

https://github.com/aijinrjinr/MLB-Seg

Link to the dataset(s)

https://github.com/yulequan/UA-MT/tree/88ed29ad794f877122e542a7fa9505a76fa83515/data

https://zenodo.org/record/8026660


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a semi-supervised method for medical image segmentation that combines i) a meta-learning module to adaptively combine GT based estimated labels and bootstrapped predicted labels and ii) an aggregation and regularization scheme for data augmentation, iii) within a teacher-student network learning framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed application of meta-learning to learn the pixelwise linear combination weights for two different possible labels, namely, the labels estimated from a network trained on the GT and the labels that the network that is being trained itself predicts, seems to be a good idea for adaptive
    2. The proposed method goes further to propose a method to aggregate various data augmentations, which seems to boost performance when incorporated within teacher-student network framework.
    3. The proposed method seems to demonstrate state-of-the-art performance, but especially demonstrates large improvements compared to their baseline method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main figure, Fig.1, is not very informative nor clear. The visual descriptions seem to be lacking overall.
    2. While it is understandable, due to the complexity of the method, the mathematical notations in the paper could be improved for better clarity. For instance, the various notations in the subscript and superscript, such as \tilde{n}, are hard to follow.
    3. The qualitative analysis does not clearly demonstrate the detailed effect of the proposed method.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    As the authors provide code for their method, it seems unlikely that reproducibility will be a problem.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. I think it would be better to exclude details and simplify Fig.1 to better convey the main idea more clearly, and figures visualizing details of each component should be added.
    2. The mathematical notations in the paper should be improved for better clarity. 3. The qualitative analysis should be improved to clarify the detailed effect of the proposed method.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper seems to present solid technical methods that contribute to quantitative improvements.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The rebuttals have clarified concerns raised by a different reviewer and the AC.



Review #3

  • Please describe the contribution of the paper

    The paper present a method for semi-supervised image segmentation to improve utilization of both labelled and unlabeled data in training of a segmentation network. The framework proposes a label weighting schema based on loss gradients directions, and leverages a teacher-student model to enhance stability of pseudo labels via ensembling predictions from various augmented versions of the same input.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • An impactful problem is tackled and the combination of proposed components, namely gradient-based sample weighting and consistency-based pseudo label enhancement via teacher-student networks, seems technically sound and suited to the problem.

    • The experimental scope is fairly thorough including comparison with recent methods and a detailed ablation study.

    • Two different public medial datasets are used for evaluation, and the source code is presented which grants high reproducibility.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • A weakness could be the choice of backbone network, which is limited to UNet++. I think integration of more recent methods (such as transformer based segmentation architectures and more variety choices of backbone) could further add value to the manuscript.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper has high reproducibility and the dataset and code are publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The proposed framework seems interesting and highly suited to the tackled semi-supervised segmentation task. The comparisons and ablation studies are thorough and relevant.

    • Power analysis for the presented results in Tables 1 & 2 & 3 could add value to the manuscript to show the obtained improvements are statistically significant.

    • The choice of backbone network could be better explored as some more recent network architectures may give a better generalizability. Various choices of backbone network can be explored in the current version of manuscript (if rebuttal time and paper space limit allows), or can be followed in future extensions of the work.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed solution is an interesting integration of relevant methods and approaches. The experiments are fairly comprehensive. My concern is significance of results and limited exploration of backbone network. Hence, my rating is weak accept.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have adequately addressed my concerns regarding significance of results and the choice of backbone model and hence I am inclined to change my rating to “accept”.



Review #4

  • Please describe the contribution of the paper

     The paper proposes a Meta-Learning for Bootstrapping Medical Image Segmentation (MLB-Seg) method for semi-supervised medical image segmentation, which involves a two-step process. In the first step, a segmentation model is trained on a small set of clean labeled images to generate initial labels for unlabeled data. In the second step, a per-pixel weight mapping system is introduced to dynamically assign weights to both the initialized labels and the model’s own predictions using a meta-process.  A consistency-based Pseudo Label Enhancement (PLE) scheme is introduced to improve the quality of the model’s own predictions by ensembling predictions from various augmented versions of the same input. Moreover, a mean teacher is introduced into the PLE scheme to reduce noise in the weight maps.  Extensive experimental results on public atrial and prostate segmentation datasets demonstrate that the proposed method achieves state-of-the-art results under semi-supervision.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

     A Meta-Learning for Bootstrapping Medical Image Segmentation (MLB-Seg) method for semi-supervised medical image segmentation.  The paper is easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

     The authors argue that other methods based on consistency regularization [12, 25] increase computational complexity and slow down the training process, as in the second paragraph of section 1. However, the proposed meta-learning strategy also introduces more computation requirements, such as memory and training time.  The proposed method can only be applied to binary class segmentation tasks.  No validation dataset in experiments.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided code, so the reproducibility should be guaranteed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

     It is suggested to provide the computational efficiency analysis.  Why choose to use two independent weighting maps rather than one weighting map (w) and its complementary map (1-w)?  It seems that each dataset contains only training and test datasets, so the hyperparameters should be tuned on the test data. This goes against the principles of machine learning, where in practice we cannot tune hyperparameters at test time because we should not have ground truths (except for evaluation purposes).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the comments in section 6&9. I will raise my score based on satisfactory response.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Sorry for the wrong comment about the validation set. About other comments, i.e., computational efficiency and application to multi-class task, the authors promise to explore in the future. Overall, this paper is not bad. After reading comments from other reviewers, I change the score from wr to wa.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    While the reviewers believe this work has some merits (i.e., R1,R2: interesting idea, R2: relevance of the tackled problem, R2: extensive experiments), they have also raised important concerns that need to be addressed. In particular, R3 has brought a very important concern, related to the lack of a proper validation set. More concretely, standard machine learning practices recommend to conduct the hyperparameter search, including early stopping, in an independent validation set. However, after reading the paper, it seems that the authors have employed a validation set (extracted from the training set) for this purpose. Furthermore, R3 encourages the authors to assess the performance of the proposed approach from a computational complexity standpoint, as the reviewer believes that the proposed approach may also incur an increase on computational costs. Furthermore, R1 considers that the notation could be further improved, whereas R2 emphasizes that the integration of newer segmentation backbones may strengthen the paper. Last, one important concern brought by this AC, is the fairness in the comparisons reported in Tables 1 and 2. In particular, it is unclear whether all the methods were compared under the same conditions. For example, the samples used in training and testing seem to be different across the different methods, while the authors directly report the numbers in the original papers (see for instance [1] in table 1, whose values represent the average of multiple runs. Similarly, for prostate segmentation, the three test samples are extracted randomly). Thus, I strongly encourage the authors to address all these concerns in their rebuttal.




Author Feedback

We appreciate the valuable insights provided by the reviewers and AC. We hope the concerns are addressed in the following.

R1 (accept)

  • Fig.1 and notations: Our method adjusts weights between initialized and pseudo-labels using meta-learning in three steps at each iteration: 1) Compute loss with initialized weight maps, 2) Update weight maps using clean data via a meta-process, 3) Update the model using the new weight maps. In our notation, ‘n’ stands for initialized labels’ weight maps, ‘p’ for pseudo labels, ‘∼’ for updated weight maps, and ‘*’ for optimal weight maps. Fig.1 and notations will be simplified.

  • Lack of detailed qualitative analysis: Our method adaptively adjusts the weight of pixels based on their reliability. We demonstrate this through weight map visualizations in Fig. 2 of the manuscript. Comparing sections (VII), (V) and (IV), it’s evident that higher weights are allotted to accurately predicted pseudo-labeled pixels that were initially mislabeled. We will provide more examples and clearer illustrations.

R2 (weak accept)

  • Statistical significance analysis: Statistical significance is evident in comparison with baseline UNet++, e.g., lower standard deviation (3.64% vs. 14.83% in Dice) with a p-value of 0.009 (< 0.05) in the LA dataset. More analysis will be included in the next version.

  • The backbone choice: MLB-Seg also performed well with different backbones (TransUnet, SwinUnet, and DeepLabv3+) on the PROMISE12 dataset (Dice: 82.16%/74.38%/76.29%) compared to the baseline models (Dice: 77.24%/65.56%/67.77%). Notably, with the same resolution of 256, MLB-Seg using TransUnet greatly surpassed Unet++ (82.16% vs. 78.27%).

R3 (weak reject)

  • Computation cost: We did not claim our method to be computation-efficient in our manuscript and apologize for such misleading. This will be clarified in the updated manuscript. While MLB-Seg doesn’t add model parameters or inference time, it does extend the training period due to the inner optimization loop required for learning the weight maps. However, this process results in a significant improvement compared with consistency-regularization methods (+4.44%/3.15%/2.78%/2.12%/0.98% compared to [25], [6], [22], [12], [23]). Improved computational efficiency is a future goal, which we’ll clarify in an added discussion about efficiency.

  • Can only be applied to binary class: Our model could adjust to multi-class segmentation, where weight maps become multi-channel.

  • Why two independent weight maps: Two sets of hyper-parameters are necessary to dynamically adjust the contribution of different training samples and explore the optimal combination of loss terms. Only optimizing one set of weight maps could lead to inferior results. We will add this result in the next version to clarify this point.

  • Absence of validation set and tuning on test set: We assure that we did not tune hyper-parameters on test datasets in any experiment. While the LA dataset lacks a validation split, the PROMISE12 dataset has one. For hyper-parameters (e.g., lr, decay), we follow [25, 30] to set them initially for the main segmentation network and the meta-learner, with only slight tuning on PROMISE12’s validation set due to our method’s insensitivity to these parameters. For LA experiments, we used these hyper-parameters directly and used the last checkpoint for reference.

AC

  • Fairness in comparison: For the LA dataset, we adhere to the same 80 cases for training and 20 for testing, and report single-run results, following [29,25,9, 6, 22,12,23] in Tab. 1. We’ll include multi-run results for LA to ensure a fair comparison with [1]. For PROMISE12, we strictly follow the setting of [15] which used a random split of 40 training, 4 validation, and 6 test cases. We were unable to follow the exact same split since [15] did not provide any split information. We will either request this information or replicate their results using our split for fairness and update Tab. 2.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After reading the reviews and the rebuttal, I believe that authors have positively addressed the raised concerns during the review process. In particular, the responses given to the questions related to the fairness of the experiments and the training protocol show that authors follow standard and well-established practices in the literature. The remaining comments are minor, and can be easily addressed in the camera-ready version. Thus, conciling the reviews and authors rebuttal, I recommend the acceptance of this work.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

     This paper proposes a good technical contribution for semi-supervised medical image segmentation. The authors were able to clear up the issues raised by the reviewers (in particular an important one related to the experimental protocol), and all reviewers seemed fully satisfied with the rebuttal. Hence I recommend acceptance of this paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a semi-supervised segmentation by bootstrapping an initial set of labeled training with pixel-weighted maps and a consistency-based self enhancement. The work has also been described as technically suited to an impactful problem. The reviews have concerns on the complexity of the method, computational efficiency, comparison with recent methods, and the statistical relevance of the results. The rebuttal may be considered evaside on a few key points by deferring them to future work. The provided new standard deviation shows an important improvement (from 14.83% to 3.64%), which may require further explanation. For all these reasons, and situating the work with respect to other submissions, the recommendation is towards Acceptance.



back to top