Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Meng Zhou, Zhe Xu, Kang Zhou, Raymond Kai-yu Tong

Abstract

Deep learning-based segmentation typically requires a large amount of data with dense manual delineation, which is both time-consuming and expensive to obtain for medical images. Consequently, weakly-supervised learning, which attempts to utilize sparse annotations such as scribbles for effective training, has garnered considerable attention. However, such scribble-supervision inherently lacks sufficient structural information, leading to two critical challenges: (i) while achieving good performance in overall overlap metrics such as Dice score, the existing methods struggle to perform satisfactory local prediction because no desired structural priors are accessible during training; (ii) the class feature distributions are inevitably less-compact due to sparse and extremely incomplete supervision, leading to poor generalizability. To address these, in this paper, we propose the SC-Net, a new scribble-supervised approach that combines \textbf{S}uperpixel-guided scribble walking with \textbf{C}lass-wise contrastive regularization. Specifically, the framework is built upon the recent dual-decoder backbone design, where predictions from two slightly different decoders are randomly mixed to provide auxiliary pseudo-label supervision. Besides the sparse and pseudo supervision, the scribbles walk towards unlabeled pixels guided by superpixel connectivity and image content to offer as much dense supervision as possible. Then, the class-wise contrastive regularization disconnects the feature manifolds of different classes to encourage the compactness of class feature distributions. We evaluate our approach on the public cardiac dataset ACDC and demonstrate the superiority of our method compared to recent scribble-supervised and semi-supervised methods with similar labeling efforts.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_13

SharedIt: https://rdcu.be/dnwxV

Link to the code repository

https://github.com/Lemonzhoumeng/SC-Net

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a scribble-based weakly supervised model for cardiac segmentation. The authors combined several existing methods including the contrastive learning, mix-up and super-pixel learning. Thus, the proposed model achieved good performance on the ACDC dataset. Overall, each claim is supported corresponding results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Using super-pixel learning to expand the scribble-like sparse annotations is interesting and the experiments demonstrated its effectiveness.
    2. The paper organization is clear.
    3. Shrinking the class distribution is useful in such a setting, which also raises another idea for semi-/weakly supervised learning compared to the mainstream consistency-based methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. For the evaluation of local prediction, it is better to add other metrics like the topology similarity or show more results to demonstrate it. Or, the authors are suggested to revise some sentences to avoid the over-claimed ones.
    2. The paper is suggested to include some related works for discussion to state their differences. [1] Wu, Zhonghua, et al. “Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation.” ECCV 2022, 2022. [2] Wu, Yicheng, et al. “Mutual consistency learning for semi-supervised medical image segmentation.” Medical Image Analysis 81 (2022): 102530.
    3. The authors are suggested to revise the title to specific ‘cardiac segmentation’ since they only conducted experiments on ACDC. Future work should include more challenging objects.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good reproducibility

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. See the above weaknesses;
    2. The contribution statement can be improved by separating the discussion part away to make it more clear.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, this paper figures out that the class-level compactness is also important for sparse supervised medical tasks, which is a complement to the mainstream consistency-based methods. Considering it is useful in medical scenarios, this paper should be marginal above the acceptance bar of MICCAI.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Good rebuttal responses. Most issues need to be addressed in the camera-ready version and I think the final paper should be above the acceptance bar of MICCAI so I change the score to accept.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a scribble-based weakly supervised segmentation algorithm, which is evaluated on the ACDC cardiac segmentation dataset. The method is based on a previously proposed framework, and includes as two novel components: a contrastive regularisation term and superpixels for propagating scribble labels to other image regions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method is evaluated against a wide range of baselines, including semi-, weakly and fully supervised settings.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper is in large parts very similar to [14], which is not apparent from the manuscript without looking at [14] closely.

    • As far as I can assess, the paper contains two novel components, a contrastive regularisation term and a so-called “scribble walking” step, where scribble labels are propagated to other regions in the image. Unfortunately, it is not explained how this “walking” works even though this seems to be a central contribution. The section on contrastive regularisation is quite condensed and difficult to follow. Instead, a large portion of the manuscript is dedicated to reproducing the method description of [14].

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I believe the method is not explained clearly enough (see weaknesses) to be easily reproduced without having code available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Can 2018 (https://arxiv.org/pdf/1807.04668.pdf) is relevant prior work which has also been discussed thoroughly in this manuscript’s predecessor [14], and I suggest include it as related work in this submission too.

    • The overall segmentation performance seems to be quite a bit too low across the board, including the supervised case. For example the performance in Can 2018 is considerably better, both for the weakly and fully supervised case. These numbers (all on ACDC) should be comparable.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In its current form, I believe the actual contributions of this paper are not distilled and presented clearly enough (and distinguished from prior work) to make the paper interesting for the MICCAI audience.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I accept the authors’ justification that different data splits may be responsible for performance drops. I still find the performance drop quite large but I’m willing to drop this argument here.

    Regarding the scribble walking: Thank you for the clarification. I believe I understand this now, but believe my confusion came from the term “walking”, when really, there is no iteration taking place but a simple single-step expansion. I would suggest to use a different, less loaded term.

    Regarding relationship to [14]: Large parts of the manuscript beyond Sec. 2.1 (e.g. the experimental evaluation and comparison methods) draw heavily on [14]. However, the evaluation and comparisons themselves are sound and useful even though they already appeared in [14].



Review #3

  • Please describe the contribution of the paper

    The authors present a weakly supervised segmentation model which utilizes scribbles and expands them with superpixels (SLIC) to more structurally correct pseudo-labels. They also propose a class-wise contrastive regularization, where class prototypes are used to compress the class-specific latent features using contrastive learning. The authors build on top of an existing dual-decoder architecture [1] and extend it with their superpixel-based scribbles and class-wise contrastive normalization. They show that adding these two components to the dual decoder leads to an improvement on the public cardiac ACDC dataset with respect to existing semi-supervised and weakly supervised methods.

    [1] Luo et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision, MICCAI 2022

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Paper presentation: The paper is extremely well-written, with almost no typos, good structure, and ideas, which are easy to follow. Each design decision is well-argumented and supported by ablation studies. The authors have also presented all of the related literature to their method.

    (2) Particularly strong evaluation: The authors compare their method to many semi- and weakly-supervised methods and show how previous methods fail to capture local structural features.

    (3) Well-supported design decisions: The two novel components added to the existing dual-decoder framework, (*) scribble-based walking and () class-wise contrastive learning, are well-justified and supported by extensive ablation studies. The authors show that (*) and () are both important by alternately excluding them from their model and visualizing failure cases of previous approaches which are mitigated by their proposed method. This supports their decision to include class-wise contrastive learning as a novel component of the dual decoder architecture [1].

    [1] Luo et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision, MICCAI 2022

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) How correct are the pseudo-labels compared to other approaches: While the authors show that adding scribble walking improves the performance, it is not clear whether using this SLIC superpixel-based approaches produces valid pseudo-labels, or at least to what extent these pseudo-labels are valid. Fig. 2 only shows the final segmentation results after training on the pseudo-labels. However, if the authors include how well their SLIC pseudo labels (with uncertainty-pruning) compare in quality to the other pseudo labels (Random Walker, pCE etc.) this would further justify their decisions. I am curious why the authors did not include this information.

    (2) The model is quite sensitive to hyperparameter changes (see Fig. 3 lambda_sNR). Small parameter changes lead to large performance drops (changing lambda from 0.005 to 0.001 leads to 7% change in DSC).

    (3) Although the model leads to a significant improvement in HD95 compared to the dual encoder proposed in DBMS [1], it is unclear whether a simple largest-connected-component post-processing would not have been sufficient to improve the previous approach. The qualitative results indicate that the high HD95 in DBMS is caused by small isolated regions outside of the cardiac region, which could be solved by largest-connected-component analysis.

    (4) The method is only evaluated on one dataset and might have a limited generalizability to other datasets and imaging modalities, although the authors argue that including contrastive learning improves the generalizability to unseen data. This requires further experiments (on unseen datasets) to be supported as a claim. [1] Luo et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision, MICCAI 2022

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is good. The authors build on top of the dual decoder framework DBMS [1] which has open-source code and it could easily be extended with the description of their method in the paper. They will also publish their code. The dataset they use is publicly available.

    [1] Luo et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision, MICCAI 2022

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Following the weaknesses I have these comments on how to improve the paper:

    (1) To support the decision to use superpixels as a guidance for scribble-walking, it would make sense to compare the correctness of these pseudo-labels with the methods which the authors compare to in Table 1. (Random Walker, pCE etc.) This could be done both quantitatively and qualitatively - considering of the overlap of the pseudo labels with the ground-truth labels and also considering a visual inspection of the pseudo-labels. This would be also consistent with the paper’s structure and experiments (Table 1 and Fig 2) and would be a strong support to why the authors decided to go for SLIC instead of any other classical approach to generate the pseudo-labels.

    (2) It is not clear whether the addition of this superpixel pruning leads to any improvement, e.g. if a superpixel is covered by more than one scribble or none, consider the area as “undefined”. I am curious whether the authors have a justification of why this helps. Does it function as an uncertainty-pruning and how much does including this improve the results?

    (3) Visual results (qualitative experiments) are needed to support that “thanks to the more compact feature distributions, our method reduces false-positive predictions, as indicated in the red box”. The final result in Fig. 2 does show a reduction of false positives. However, this cannot be directly contributed to the compact feature distributions, or at least is not supported by the qualitative experiments. It is interesting why the authors included a row in Table 1 to show how their model decreases the HD95 metric when the contrastive learning is omitted but they have not done the same in their qualitative experiments. This would greatly improve the paper and its claims.

    Small comments: () A nice addition to Table 1 and Figure 2 would be to highlight the methods with the red bounding box (non-cardiac structures) also in Table 1 since they are the ones with the particularly large HD95 values. This would make it clear that the quantitative and qualitative experiments support the author’s decision to include this class-wise contrastive learning approach. () Typo after Eq. (2). Despite effectiveness -> Despite its effectiveness

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper has some limitations, e.g. evaluation on only one dataset and some missing justification behind certain aspects of their methods, the methodology, presentation, and comparison to related methods is excellent. Many of the flaws of the paper I have listed could easily be corrected in the camera-ready version, and some can be delegated to future work. The paper is an excellent extension of previous work DBMS [1] and would be a good contribution to the conference.

    [1] Luo et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision, MICCAI 2022

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have addressed some of my comments but have not addressed my concerns regarding qualitative experiments and the quality of the pseudo-labels. Hence, I am keeping my rating.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper introduced a scribble-based weakly supervised model for cardiac segmentation. The method integrates contrastive learning, mix-up and super-pixel learning, and is evaluated on the ACDC cardiac segmentation dataset. The paper provides a strong experimental evaluation. However, in the rebuttal, the authors need to clarify the novel components of this work in comparison to [14] (as the proposed method is largely based on [14]). How does the “scribble walking” step work? This step seems to be a central contribution, but is not well described.




Author Feedback

We are glad that reviewers find our work “interesting” (R1), “excellent extension of DBMS [14]” (R3), “clear organization (R1,R3)”, “extremely well-written (R3)”, “well-supported design decisions” (R1,R3), and “strong evaluation” (AC,R3). Thanks for the supports by R1&R3. Our responses (mainly for R2) to major concerns are as follows.

Q1 (AC,R2): The authors need to clarify the novel components compared to DBMS [14] (as the proposed method is largely based on DBMS [14]). A1: (1) We’d like to emphasize that the SOTA dual-decoder model [14] (i.e., DBMS [1] mentioned by R3) is our backbone framework, yet, we focus on different challenges. We explicitly introduce DBMS [14] in “Sec 2.1: Preliminaries and Basic Framework” and state “we build upon [14]”. Then, to address two critical challenges posed by scribble sparse annotations, we propose two novel components added to DBMS, i.e., Superpixel-guided Scribble Walking (Sec. 2.2) for structural prior augmentation and Class-wise Contrastive Regularization (Sec. 2.3) to encourage compact class feature distributions. Thus, we respectfully disagree with the comment saying “large parts of the paper are to reproduce the method description of [14]”. We only introduce [14] in Sec. 2.1 as “Preliminaries”. (2) We are grateful that R1 and R3 appreciate our extension and the novel components. Especially, R3 expressed great support by stating “the paper is an excellent extension of DBMS [14] and would be a good contribution to MICCAI” and “each design is well-supported”.

Q2 (R2): (1) How does the “scribble walking” step work? This step seems to be a central contribution but not well described. (2) The section on contrastive regularization is quite condensed and difficult to follow. A2: We apologize for any confusion caused. Yet, it is hard for us to determine which aspects require specific clarification without enough information. We try to clarify the overall processes as follows. (1) In Sec. 2.1 and Fig. 1, we clearly describe the ‘scribble walking’ process. We employ the widely used Simple Linear Iterative Clustering (SLIC) algorithm to generate superpixel clusters. Then, if the superpixel cluster overlaps with a scribble s_r, the label y_r of s_r walks towards to the pixels contained in this cluster. Yet, if a superpixel cluster overlaps with multiple scribbles or none at all, the pixels within that cluster remain unlabeled. We believe the description is intuitive. (2) We refer to R3 for another explanation: the class-wise contrastive regularization employs class prototypes to compress class-specific features using contrastive learning. Fig. 1 and Eq. 4 illustrate and formulate this concept clearly. The similarity metric used in Eq. 4 is the common cosine similarity. We hope our response can help R2 better understand the paper. Code will be released to help follow the details.

Q3 (R2): The performance in the relevant work CAN is considerably better, both for the weakly and fully supervised. A3: We will add a discussion for CAN (2018). Note that CAN used 160/40 public volumes for training/validation (w/o cross-validation) and tested on the in-house 100 images using the challenge server. Yet, we use the 200 public volumes to perform five-fold cross-validation. Thus, such performance differences are normal.

Q4 (R3): Fig. 3 λ_sNR: changing λ_sNR from 0.005 to 0.001 leads to 7% change in DSC. A4: R3 may read the wrong line. The dotted line denotes DSC, and the solid line denotes 95HD. Thus, changing λ_sNR from 0.001 to 0.005 leads to only 1% change in DSC, but it improves 95HD by 2.1 given properly higher weight for augmented structural prior.

Q5 (R3): Quality comparison of the augmented label. A5: We compare the accuracy of the augmented label with random walker, resulting in an improvement from 0.78 to 0.81.

Other minor concerns will also be carefully revised. We sincerely thank the suggestions by R1&R3, which are very constructive for our camera-ready and extended versions.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall the rebuttal provided good responses, including clarification of scribble walking and relationship with [14], and the reviewers acknowledged the merits of the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    It is an interesting paper, although I still have two main concerns. First, the innovation compared to [14] might be incremental. Second, I may not be quite convinced how scribble supervised segmentation can be implemented for clinical image analysis, where precise segmentation is needed. In computer vision, perhaps scribbles can differentiate objects of completely different classes. However, in medical imaging, most images would present similar anatomical structures. For medical image segmentation, it is those differences of the same structure (i.e. class) that matters. I am not quite sure scribbles can encode such subtle yet important differences.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper describes a method for weakly-supervised learning of segmentation tasks based on scribble annotations. The reviewers raised several concerns regarding overlap with prior work and comparison to other methods. Based on the rebuttal, two out of three reviewers have increased their rating based on the rebuttal, from 5 to 6 and 3 to 4, respectively. One reviewer raises some remaining issues regarding overlap with existing work but identified the novelty of the contributions in the current work. Overall, I deem the work acceptable for MICCAI.



back to top