Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ke Zhang, Xiahai Zhuang

Abstract

Cardiac segmentation is an essential step for the diagnosis of cardiovascular diseases. However, pixel-wise dense labeling is both costly and time-consuming. Scribble, as a form of sparse annotation, is more accessible than full annotations. However, it’s particularly challenging to train a segmentation network with weak supervision from scribbles. To tackle this problem, we propose a new scribble-guided method for cardiac segmentation, based on the Positive-Unlabeled (PU) learning framework and shape consistency regularization, and termed as ShapePU. To leverage unlabeled pixels via PU learning, we first present an Expectation-Maximization (EM) algorithm to estimate the proportion of each class in the unlabeled pixels. Given the estimated ratios, we then introduce the marginal probability maximization to identify the classes of unlabeled pixels. To exploit shape knowledge, we apply cutout operations to training images, and penalize the inconsistent segmentation results. Evaluated on two open datasets, i.e, ACDC and MSCMRseg, our scribble-supervised ShapePU surpassed the fully supervised approach respectively by 1.4% and 9.8% in average Dice, and outperformed the state-of-the-art weakly supervised and PU learning methods by large margins.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_16

SharedIt: https://rdcu.be/cVRYY

Link to the code repository

https://github.com/BWGZK/ShapePU

Link to the dataset(s)

https://zmiclab.github.io/zxh/0/mscmrseg19/data.html

https://github.com/BWGZK/CycleMix/tree/main/MSCMR_scribbles

https://www.creatis.insa-lyon.fr/Challenge/acdc/

https://vios-s.github.io/multiscale-adversarial-attention-gates/data




Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a weakly supervised segmentation approach utilizing scribble-guided annotation, a positive-unlabeled learning framework and shape consistency regularization. The proportion of each segmentation class in the unlabeled pixels is estimated via EM. The approach is evaluated on two open data sets and reportedly outperforms other supervised as well as weakly supervised approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • Novel ways to ease the time and labor-intensive annotation work are very important especially in the medical domain, to allow doctors to focus their time directly on patients. • The paper introduces a novel multi-class PU learning framework with a interesting integration of shape information in the loss function • Results are demonstrated on publicly available data sets, are accompanied by an ablation study and are compared to different levels of contender methods with superior performance (even on par with fully supervised approach)

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I’m missing the quantification of the time savings compared to fully supervised methods, which is a major motivation for this work. Is the tradeoff annotation time/accuracy arguable? In the end doctors want the most accurate model, even if this means tedious annotation work.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The work was evaluated on a publicly available dataset. Following the checklist, I assume that the code will be available after acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    • As stated above my major concerns are about the quantification of the time savings possible using the proposed approach. Is the time/accuracy trade of arguable? • Besides of being a steady standard for medical images segmentation, is there a reason to stick with U-Net? There are several modifications and/or alternatives published with demonstrated superior segmentation performance (e.g. nnUnet). • How do you explain the drop in performance when combining cutout and pu without including shape in your ablation study? Please add a short discussion on this to the respective section. Minor: • There seems to be a typo here: “We randomly divided the 45 images into 25 training images, 5 validation images, and 20 test images.” Either you used 50 images or a different train/val/test split. • Table 1: I assume the significance is always given in comparison to the previous model? A clarification in the caption or text would be appreciated. • Figure 3: The images are pretty small. If there is enough space, larger images would be appreciated • Table 2: I assume HD is HAUSSDORFF DISTANCE. Please add it to the caption. • Is there a reason for not highlighting statistically significant differences in Table 2 and 3? For consistency I’d suggest adding these indications here too. • Figure 4: It would be interesting to also in include the corresponding scribble annotations

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A novel approach in reducing the annotation workload, a sound presentation of results as well as ablation studiy and comparison to other sota methods.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I still vote for accepting this work, of course depending on the changes the authors promised in their rebuttal answers.



Review #2

  • Please describe the contribution of the paper

    Authors propose a new scribbled method for cardiac segmentation with weak supervision based on the positive unlabeled framework and shape regularization, that penalizes inconsistent segmentation results. Their method makes use of an Expectation-Maximization (EM) algorithm to estimate the proportion of each class in the unlabeled pixels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    They propose a new learning framework that can use unlabeled data to use the Expectation-Maximization (EM) algorithm to estimate the proportion of each class in the unlabeled pixels. Their background is strong and well posed. And their experiments demonstrate advantage compared against other methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is lacking a clinical test, to show real robustness in any case.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper can be reproduced and data can be collected to compare against new algorithms

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Authors provide with enough insights of their methods and a good number of experiments with comparison against other methods. Perhaps when they open their code they will be completely understood. An explainability analysis is documented so, the reader can understand the advantages of their methods

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Authors explain their paper, show all the experiments and work done, and compare and quantify themselves against other existent methods. Reproducibility is important if the authors provide the code.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    In this work, authors have proposed a weakly supervised framework where they use scribbles to train the method for segmenting LV, RV and MYO on cardiac MR images. The method involves an EM algorithm for estimating the mixture proportion, PU learning to identify the classes of unlabelled pixels by maximising marginal probability and by leveraging the consistency of features across shapes by cutting out specific regions and finding cutout equivalence. They also perform comparative analysis with prior methods on two different datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Clearly written introduction and motivation
    2. Novelty of the method: Using a probabilistic PU learning framework combined with shape/texture information for weak supervision is promising for this application. It would be interesting to see if introducing weighted version of PU loss would be more helpful (as a future work).
    3. Evaluation on multiple datasets: The evaluation is done on multiple datasets and the results look good. While the main statistically significant improvement seem to be with respect to the UnetF method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Errors in the equations: There seem to be errors in the condition used for getting the main equation (1) in the paper: sum_{j=1}^m pl(cj x) = 1 should be replaced by sum_ {j=1}^m pu(cj x) = 1 for the equation (1) to be valid. Also check equations 3 and 5 in the supplementary material (numerator and denominator seems exchanged and the pu is replaced by pl). Also, given that we do not know what classes the unlabelled pixels belong to, the authors should explain how the assumption for eqn (1) can hold good.
    2. Also, in the loss function authors determine the positive and unlabelled pixels as \omega and \omega_bar, but they haven’t used unlabelled voxels in the loss equation? They seem to have calculated the marginal probability only for the positive pixels? Can they explain why?
    3. Lack of details regarding differences between datasets: While it is good that authors evaluated on multiple datasets, more information should be provided on how they different and what measures were taken to reduce domain shift? is ACDC also post-gad? how is it handled (any normalisation done)? differences due to pathological conditions (eg. myocardiopathy)?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The implementation details are mentioned and the authors have used a standard Unet model for this framework. A few more details such as how long it took and how many iteration for EM (was the EM algorithm performed for each epoch?) would be helpful.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    With reference to my above comments regarding weaknesses, here are my comments:

    1. Please clarify the equations for the EM section and the PU loss section.
    2. Why should it be shape and not texture? When the cut-out is performed the model could perceive it as change in the shape or in texture since the texture information is missing in the cut-out region - hence the term ‘shape’ is quite misleading. Also, how helpful is cutout? Would be helpful to see the results in extreme cases? e.g. distortion or non-circular MYO?
    3. Provide more details regarding the shift in characteristics between the two datasets and a brief disc on how they were reduced. Samples from the datasets could be included in the suppl. material.
    4. If possible, it would be good to see the cases where the method fails or gives lower performance.
    5. The results report greater improvement in HD values (btw, average or 95th percentile of HD?) than dice - why is this? was the boundary smooth with the proposed method that led to lower HD? Authors should discuss the reason behind this given that the improvement in HD is much better than Dice).
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method proposed in the paper has potential but is weighed down by the lack of clarity in the equations and assumptions. Nevertheless the results look promising and seem to work on multiple dataset outperforming existing methods.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    The authors have sufficiently responded to my comments earlier and have agreed to correct the typos in the equations and PU framework. Also, they have agreed to include a brief discussion on the shape consistency and the role of cut-out in the framework. Hence my modified review (7: strong accept) would hold good on the condition that the authors make the specified changes.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work presents a framework for heart segmentation that relies on scribbles to ease the annotation process. The framework is said to be formulated as a positive and unlabeled problem. The authors also propose a regularization scheme for shape consistency. The method was evaluated in two different datasets showing very good results.

    Overall, the paper has received positive comments. Nevertheless, there are some aspects regarding the methodology that need to be clarified. Please refer to the remarks regarding the method’s formulation (R3) and the experiments (R1, R3). Please also consider the following comments: 1) PU framework - In the strict definition of PU framework Eq 2 should not be considered as a “PU loss” since it only considers unlabelled pxels. Moreover,it is only considering candidate “negative” samples as it uses those that do not belong to class C-j. In this sense, the term PU loss is misleading. See for instance ref [9] in the paper that uses the terms L^+ and L- in a similar situation and make things more clear. 2) EM step - How the method gurantees that the background (i.e. image content that is not the heart) does not intefere in the mixture proportion estimation? How is the EM algorithm initialized? 3) As highlighted by R3, it is not clear how the cutout can gurantee shape consistency. Should it be renamed to global consistency? 4) The related works section refers to general PU framworls. Previous works in the medical imaging community have used the PU framework. Please acknowledge them [1].

    [1] Zuluaga, M. A., Hush, D., Delgado Leyton, E. J., Hoyos, M. H., & Orkisz, M. (2011, September). Learning from only positive and unlabeled data to detect lesions in vascular CT images. In International conference on medical image computing and computer-assisted intervention (pp. 9-16)

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We appreciate the constructive and thoughtful comments from our reviewers. Here are our responses in detail.

  1. Method concerns

1.1 EM step

MR, R3, the influence of background? how the assumption for eq (1) can hold good? We apologize for the unclear statement. The mixture ratios of all classes including background are estimated in EM step. For the condition of eq (1), given that unlabeled pixel x must belong to one of the classes, the probabilities of all classes are summed to 1. After the EM estimation, we calculate PU loss for foreground classes. We will further clarify it in the revision.

MR, how is EM estimation initialized? Thanks. As described in page 4, line 2-5, the class priors of unlabeled data are initialized with the class frequencies of labeled data. We will further clarify it in the revision.

R3, errors in the equation? We apologize for the typos and will correct it in the revision.

R3, how long it took and how many iterations for EM? Thanks. For MSCMR dataset, each batch of size 4 takes 0.0094 seconds and 6.9 iterations on average for EM.

1.2 PU loss

MR, ‘PU loss’ is misleading. Thanks. We agree that L^+ and L^- are clearer and will revise accordingly in the revision.

R3, PU loss is calculated for positive labels? We apologize for the typo in Eq (2), where \omega should be replaced with \bar{omega}. The PU loss is calculated for unlabeled pixels.

1.3 Shape consistency

MR, R3, ‘Shape’ is misleading. Thanks. We agree that cutout changes both the shape and texture. As suggested by MR, we will rename ‘shape consistency’ to ‘global consistency’ to make it more clear in the revision.

  1. Experiment concerns

R1, tradeoff annotation time / accuracy. Thanks. For MSCMR dataset, the time consumed by full annotation is about 6 times that of scribble annotation. From our experience, scribble guided ShapePU could achieve much better performance than the fully supervised method with the same annotation efforts. The tradeoff between annotation time and accuracy is indeed meaningful, and we will investigate it in our future work.

R1, why UNet? Thanks. Although using a combination of strategies could achieve better results, here we use the standard UNet for concept proof and validation. Our method is also applicable to other backbone, including nnUNet. We would like to verify it in the future work.

R1, R3, drop in performance of combining Cutout and PU without shape; how helpful is Cutout? Thanks. When combined with Cutout, PU (without shape) improved the average Dice marginally from 83.3% to 83.4%. Cutout enhances the localization ability, but may change the shape of target structure. Therefore, it could be difficult for the segmentation model to learn the shape priors, leading to the performance drop in some structures. When combined with shape consistency, which overcomes the disadvantage by requiring the cutout equivalence, the performance is evidently better. We will include the discussion in the revision.

R1, minor comments: typo, significance test, figures, and HD. Thanks. We will revise the manuscript accordingly.

R2, clinical test. Thanks. We agree clinical test is important and would like to study it in future work.

R3, improvement in HD: Yes. ShapePU identifies the classes of unlabeled pixels, and penalizes the inconsistent segmentations. Therefore, it results in smoother boundary with less outliers, leading to lower HD in general.

R3, datasets difference. Yes. MSCMRseg is more challenging compared to ACDC, as LGE CMR segmentation per se is more complex and the training set is smaller. We processed the two datasets in the same way and normalized the intensity to zero mean and unit variance. We will clarify it in the revision.

R3, failure case Thanks. We will supply more poor cases for illustration.

  1. Related work concerns

MR: include PU framework in medical imaging community. Thanks. We will acknowledge these PU frameworks in the revision.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have properly addressed all the concerns raised by the reviewers and meta-reviewer. Nevertheless, R3 highlights the need to address all the points the authors committed to in the rebuttal for the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work initially received positive comments from the reviewers, with the three of them giving a score of 6. The authors have partially addressed most of the concerns raised during the review process, with several responses promising to address these comments in the reviewed version. Despite this, I believe that with few important modifications (mostly methodology clarifications and discussion of prior PU methods as well as the impact of cut out in the current framework) this work can be accepted at MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the 1st round of reviews, the reviewers have highlighted the fact that this paper proposes a valuable methodological contribution, that is also shown to perform well in practice, through sound protocol and ablation study.

    In their remarks, clarifications were required on some aspects of the methodology. The authors seem to have provided sufficient and convincing explanations on their methodology in the rebuttal; Therefore I recommend acceptance for this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top