Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qiuhui Chen, Yi Hong

Abstract

Recently, weakly-supervised image segmentation using weak annotations like scribbles has gained great attention, since such annotations are much easier to obtain compared to time-consuming and label-intensive labeling at the pixel/voxel level. However, because scribbles lack structure information of region of interest (ROI), existing scribble-based methods suffer from poor boundary localization. Furthermore, most current methods are designed for 2D image segmentation, which do not fully leverage the volumetric information if directly applied to image slices. In this paper, we propose a scribble-based volumetric image segmentation, Scribble2D5, which tackles 3D anisotropic image segmentation and improves boundary prediction. To achieve this, we augment a 2.5D attention UNet with a proposed label propagation module to extend semantic information from scribbles and a combination of static and active boundary prediction to learn ROI’s boundary and regularize its shape. Extensive experiments on three public datasets demonstrate Scribble2D5 significantly outperforms current scribble-based methods and approaches the performance of fully-supervised ones. Our code is available online.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_23

SharedIt: https://rdcu.be/cVRY5

Link to the code repository

https://github.com/Qybc/Scribble2D5

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents scribble2D5, a weekly supervised deep learning approach to segment volumetric medical images based on scribbles. The paper proposes an augmentation of a 2.5 attention UNet with a label propagation module to improve boundary predictions. Additionally, they extend an active boundary loss formulation to act in 3D.

    The method is evaluated on three datasets and results show that the proposed method outperforms the state of the art on two of the datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This method achieves SOTA performance in 3D using weekly supervised scribble annotations, reducing the gap between fully supervised methods. The use of scribble annotations provides practical utility as it greatly reduces the workload of manual segmentation for 3D images. The authors successfully extend previous 2D methods to 3D, in which few alternative options exists.

    The authors propose label propagation module that uses a combination of existing methods to generate both pseudo masks and pseudo boundaries. Inclusion of these existing methods may provide additional somewhat orthogonal signals to help generate more accurate pseudo labels.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Much of the work is extending [1] to 3D with a few modifications, including:

    • 3D backbone network (2.5 attention UNet)
    • different edge detector
    • use of attention blocks
    • additional static pseudo mask

    It appears that the previous SOTA scribble methods were trained and designed for 2D images instead of 3D. The method INExtremeIS, which was designed for volumetric segmentation shows similar performance to the proposed methods using extreme points, which is arguably a less informative signal. It would be better to compare with methods designed for 3D such as [2].

    Scribbles are generated for VS and CHAOS datasets through “erosion”. This process is not cited or further explained. It should be clearly stated to ensure the these generated scribbles are representative of real scribbles. Additionally, scribbles could be generated on the ACDC dataset and compared to scribbles provided by experts.

    Why did the authors introduce an extra hyperparameter lambda_2 for active boundary loss that is not present in the 2D formulation (Chen et al., 2019)?

    [1] J. Zhang, X. Yu, A. Li, P. Song, B. Liu, and Y. Dai, “Weakly-Supervised Salient Object Detection via Scribble Annotations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [2] H. Kervadec, J. Dolz, M. Tang, E. Granger, Y. Boykov, και I. Ben Ayed, ‘Constrained-CNN losses for weakly supervised segmentation’, Medical Image Analysis, τ. 54, σσ. 88–99, 2019. [3] X. Chen, B. M. Williams, S. R. Vallabhaneni, G. Czanner, R. Williams, and Y. Zheng, “Learning Active Contour Models for Medical Image Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Datasets are publicly available and sufficient information is provided in the paper to reproduce the results with modest hardware. The abstract states that code is available online.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please check spelling and grammar page 3 - “In this way, we have a Label Propagation Module (LPM) to generate 3D pseudo labels from scribbles and images for ROI segmentation and static boundary prediction, respectively” - This sentence is confusing. page 4 - “Specifically, At the” - At is capitalized page 8 - “… methods and reduce the performance gap…” - reduce should be “reduces”

    Text in figure 1 is too small to read. The caption should also be more descriptive, describing the overall architecture.

    In equation 3, Volume_out should be (1-u) instead of u Equation 4 has duplicate L_seg term. One of these terms should specify that the mask is refined

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper produces SOTA results that provides practical utility by reducing the requirements on manual segmentations. Much of the contribution is adapting other methods to work in 3D.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper presented Scribble2D5, a method for 3D anisotropic image segmentation using scribble annotations (a type of weak supervision). A label propagation module and an active boundary loss were proposed to improve performance in terms of Dice score and overall boundary smoothness. Extensive experiments were carried out on multiple datasets to validate the effectiveness of Scribble2D5.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Scribble annotation is a promising type of supervision signal in terms of expense and performance. The paper proposed a scribble-based method and tested on three different medical image analysis datasets, which is of certain clinical feasibility. The experimental protocols were overall clear and results were extensive.

    In terms of technical contribution, the paper integrated several methods such as SLIC, 2.5D attention UNet, HED, ASPP, into the context of scribble supervised learning. A 3D active boundary loss was proposed based on its 2D ancestor. The ablation studies were well conducted.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The scribble generation protocol of VS and CHAOS datasets may need elaborating. If it is a well-known pipeline, then the paper should cite it. Otherwise, the paper should at least state the parameters that affected the scribble quality, such as length, thickness, distance to boundary, etc.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper can be fairly straight-forward to reproduce. Some key modules of the proposed method are already open-sourced, such as SLIC, HED, etc.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper was technically sound and easy to follow overall. The authors exploited low-level boundary evidence, mid-level super-voxel evidence to achieve high-level segmentation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experiments were comprehensive and showed promising results.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    Image annotations sometimes are not easy to obtain in practice because annotating at the image pixel-/voxel-level is time-consuming and needs medical expertise to provide high-quality annotations. The proposed method tries to address these challenges by presenting a scribble-based volumetric image segmentation, Scribble2D5, which tackles 3D anisotropic image segmentation and improves boundary prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • It proposes a scribble2D5 network for segmenting medical image volumes with sparse scribbles for training only.

    • It proposes a label propagation module for 3D pseudo mask generation and an active boundary loss to regularize 3D segmentation results.

    • It provides some reasonable results on three public datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It misses some related references in the proposed topic.

    • The evaluation and comparison are not sufficient.

    • Some scribble generations are not clinically realistic.

    More detailed comments are given in the following Sec. 8.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Seems okay. The code is/will be available online.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    There are some concerns of this paper:

    • It misses some related references:
    • Zhao, T., & Yin, Z. (2020). Weakly supervised cell segmentation by point annotation. IEEE Transactions on Medical Imaging, 40(10), 2736-2747.

    • Luo, X., Hu, M., Liao, W., Zhai, S., Song, T., Wang, G., & Zhang, S. (2022). Scribble-Supervised Medical Image Segmentation via Dual-Branch Network and Dynamically Mixed Pseudo Labels Supervision. arXiv preprint arXiv:2203.02106.

    • Zhang, K., & Zhuang, X. (2022). CycleMix: A Holistic Strategy for Medical Image Segmentation from Scribble Supervision. arXiv preprint arXiv:2203.01475.

    It is unclear what is the major advantage of the proposed method compared with the above related work.

    • In the comparison, they only include two scribble-based segmentation methods. More state-of-the-art weakly-supervised methods need to compare, such as:
    • [UNetD] Valvano, G., Leo, A., & Tsaftaris, S. A. (2021). Learning to segment from scribbles using multi-scale adversarial attention gates. IEEE Transactions on Medical Imaging, 40(8), 1990-2001.

    • Zhang, K., & Zhuang, X. (2022). CycleMix: A Holistic Strategy for Medical Image Segmentation from Scribble Supervision. arXiv preprint arXiv:2203.01475.

    • As for the mask-based segmentation comparison, they only include two U-Net methods. More recent methods should be included, such as:
    • UNetD [Valvano et al., 2021]
    • PostDAE [Larrazabal et al., 2020]
    • ACCL: Adversarial Constrained-CNN Loss … [Zhang et al., 2020]

    • For the VS and CHAOS datasets, the scribble generation is not real from doctors or clinicians. So the evaluation results on these two datasets are not very convincing.

    • More visualization results and qualitative evaluation should be provided.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a weakly-supervised volumetric image segmentation network, Scribble2D5. This method tries to reduce the performance gap between weakly-supervised and full-supervised segmentation methods. Meanwhile, there are some concerns on missing related references, insufficient comparison and evaluation.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Thanks for the authors’ feedback and explanation. After reading the rebuttal, I am on the fence of this paper. If ACs and all other reviewers agree to accept this paper, I am fine with it.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This works presentes a 2.5D scribble strategy to segment 3D volumes which are highly anistortopic. The main constributions of the work are a label propagation module and an active boundary loss and it relies on the assembling of a set of well-established techniques. The method achieves a good performance when evaluated in 3 different datasets. There are some aspects raised by the reviewers with which I agree which require some further clarification: 1) Positioning w.r.t the state-of-the-art: The idea of propagating a segmentation starting from scribbles has been previously proposed in the literature [1]. Similarly, all the reviewers have pointed to some state-of-the-art methods. Please position your work w.r.t. these works. In the discussion, you may omit those references that have appeared after the MICCAI submission 2) Scribbles. The procedure through which the scribbles are generated is not fully clear. Please clarify. References [1] G. Wang et al. Slic-Seg: slice-by-slice segmentation propagation of the placenta in fetal MRI using one-plane scribbles and online learning (2015). In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 29-37) 2] Luo, X., Hu, M., Liao, W., Zhai, S., Song, T., Wang, G., & Zhang, S. (2022). Scribble-Supervised Medical Image Segmentation via Dual-Branch Network and Dynamically Mixed Pseudo Labels Supervision. arXiv preprint arXiv:2203.02106. [3] Zhang, K., & Zhuang, X. (2022). CycleMix: A Holistic Strategy for Medical Image Segmentation from Scribble Supervision. arXiv preprint arXiv:2203.01475

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We thank the reviewers and AC for their constructive comments.

[All] Positioning w.r.t. the state-of-the-art  We thank the reviewers and AC for pointing out Ref [1-3]. Among them, Ref [2&3] are on arXiv after MICCAI submission. These methods mainly work on 2D slices when handling 3D images; [1] regularizes the volume size of the segmentation, but its network takes 2D slices as inputs. Differently, our scribble2D5 tackles 3D anisotropic image inputs directly, considering both static and active boundary of the ROI’s shape. Our method outperforms them on ACDC dataset: 0.914 dice vs 0.871 in [1] (segmenting the left-ventricle only as in [1]), 0.903 mean dice vs 0.872 in [2] (using 5-fold cross validation as in [2]), 0.896 mean dice vs 0.848 in [3] (using 35 subjects for training as in [3]). [1] Kervadec et al. ConstrainedCNN, MedIA 2019 [2] Luo et al. ScribbleSeg, arXiv 2022 [3] Zhang et al. CycleMix, CVPR 2022

[All] Scribble generation and comparison to manual scribbles Following [4], we simulate scribbles by an iterative morphological erosion and closing of segmentation masks, which results in a one-pixel skeleton for each object. Since the resulting background scribble is winding, we use ITK-Snap to annotate background with 2-pixel width curves. [4] Rajchl et al. Employing weak annotations for medical image analysis problems. arXiv 2017

The ACDC dataset has manual scribbles from experts and is used here for comparison. 1) Regarding the size of scribbles, the generated ones for foreground ROIs occupy ~7.2% of a mask, while manual scribbles occupy ~11.7%. 2) Regarding the performance, using generated scribbles results in a 0.836 dice, while the real ones achieve 0.906 (as reported in the paper). The generated scribbles performs worse probably because they locate at the midline of ROI, far away from the boundary, and cover a smaller region. If manual scribbles for foreground are available, the performance on VS and CHAOS datasets could be further improved.

[AC] Label propagation from scribbles in Slic-Seg [G. Wang et al, MICCAI 2015] Slic-Seg is an interactive learning method using online Random Forest and CRF. Its label propagation is limited to 2D, while we use SLIC in the preprocessing stage, which generates supervoxels from image volumes and obtains 3D pseudo masks for learning.

[R3] Missing some references and comparisons We thank R3 for pointing out the references. [Zhao et al. TMI 2020] is a point-supervised method and evaluated on a 2D cell dataset, while our method works on 3D images. The comparisons to CycleMix and ScribbleSeg have been discussed in the first QA.

[R3] Comparison to other methods like UNetD, PostDAE and ACCL These methods are scribble-based methods with additional unpaired masks for learning shape priors. They are different from our two UNet baselines, which are trained using paired masks and used as upper bounds of our method. Under similar experimental settings (we use scribbles from 35 subjects, while these methods take additional 35 unpaired masks), our method obtains a mean dice score of 0.896, vs 0.585 on UNetD, 0.676 on PostDAE, and 0.803 on ACCL.

[R1] InExtremeIS shows similar performance with a less informative signal Unlike annotating with scribbles, InExtremIS needs related expertise and special attention to locate six well-distributed points, i.e., up, down, left, right, front and back on the object boundary. Although its representation is simple, the signal is more informative because of more accurate boundary locations, especially for round brain tumors experimented by InExtremIS. However, for non-convex and complex shapes, scribbles are a more nature and simple annotating choice.

[R1] Hyper-parameter lambda2 for ABL The outside region of ROI in a medical image is typically heterogeneous and complex, we use lambda2 to weight its contribution in the loss.

We thank R1 for the writing suggestions and the issues are fixed. All our responses will be updated in the final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed the remarks raised during the first round of reviews. They are adivised to include the elements presented in the rebuttal, in particular those concerning related methods, as part of the discussion of their paper. This will allow to better position their work w.r.t. the existing literature, thus further highlighting the contributions of their work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors addressed and clarified most of the issues raised by the reviewers in the rebuttal, such as paper positioning in the SOTA methods, clarification of scribble generation method, etc. One reviewer changed his/her rating from weakly reject (4) to weakly accept (5), which makes all three reviewers on the positive side.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors’ answers in the rebuttal have addressed most comments raised during the review process. This includes positioning their work wrt prior literature as well as providing quantitative measures to evaluate their performance, which shows the superiority of the proposed work (I strongly encourage the authors to include this prior literature and its discussion in the camera-ready version). Nevertheless, the response to the inExtremeIS concern is unconvincing. Results from this method are only shown for one dataset (and extracted from the paper, where the experimental setting might be different). Thus, the statement that the scribbles might be a better choice for more complex shapes is not demonstrated empirically (note that being a more natural choice does not involve providing better performance). I believe that statements regarding InExtremeIS need to be revisited, and a deeper empirical validation to stress the effectiveness wrt to the proposed approach required for supporting these claims (but of course not in this current version).

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



back to top