Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tao Chen, Chenhui Wang, Hongming Shan

Abstract

Medical image segmentation is a challenging task with inherent ambiguity and high uncertainty attributed to factors such as unclear tumor boundaries and multiple plausible annotations. The accuracy and diversity of segmentation masks are both crucial for providing valuable references to radiologists in clinical practice. While existing diffusion models have shown strong capacities in various visual generation tasks, it is still challenging to deal with discrete masks in segmentation. To achieve accurate and diverse medical image segmentation masks, we propose a novel conditional Bernoulli Diffusion model for medical image segmentation (BerDiff). Instead of using the Gaussian noise, we first propose to use the Bernoulli noise as the diffusion kernel to enhance the capacity of the diffusion model for binary segmentation tasks, resulting in more accurate segmentation masks. Second, by leveraging the stochastic nature of the diffusion model, our BerDiff randomly samples the initial Bernoulli noise and intermediate latent variables multiple times to produce a range of diverse segmentation masks, which can highlight salient regions of interest that can serve as a valuable reference for radiologists. In addition, our BerDiff can efficiently sample sub-sequences from the overall trajectory of the reverse diffusion, thereby speeding up the segmentation process. Extensive experimental results on two medical image segmentation datasets with different modalities demonstrate that our BerDiff outperforms other recently published state-of-the-art methods. Source code is made available at https://github.com/takimailto/BerDiff.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_47

SharedIt: https://rdcu.be/dnwDV

Link to the code repository

https://github.com/takimailto/BerDiff

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    Introducing a novel conditional Bernoulli Diffusion model for medical image segmentation (BerDiff), which uses Bernoulli noise as the diffusion kernel to enhance the capacity of the diffusion model for binary segmentation tasks. This leads to more accurate segmentation masks. Demonstrating the ability of BerDiff to produce a range of diverse segmentation masks by leveraging the stochastic nature of the diffusion model, which can highlight salient regions of interest that can serve as a valuable reference for radiologists. This is achieved by randomly sampling the initial Bernoulli noise and intermediate latent variables multiple times. Proposing a more efficient segmentation process by allowing BerDiff to sample sub-sequences from the overall trajectory of the reverse diffusion, which speeds up the segmentation process. The experimental results on two medical image segmentation datasets with different modalities show that BerDiff outperforms other recently published state-of-the-art

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Instead of using Gaussian noise, a novel conditional diffusion model based on Bernoulli noise is proposed for the discrete binary segmentation task, which achieves accurate and diverse medical image segmentation masks. Subsequences can be efficiently extracted from the overall trajectory of the backward diffusion, thus speeding up the segmentation process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of research status of some others in related fields.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is very clear, with sufficient diagrams and formulas to illustrate the proposed algorithm, so this method is highly reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Some descriptions of relevant studies by others can be added as a way to compare the differences with these methods and the shortcomings of the previous work.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A novel conditional diffusion model is proposed and validated on two datasets and compared with the current method to obtain the best results. The algorithm is illustrated with more detailed formulas as well as graphical aids and a clear structure. There is a clear visualization of the results, which facilitates the visualization of the effect of the algorithm.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a Bernoulli Diffusion model for medical image segmentation which use Bernoulli noise as the diffusion kernel. Based on the proposed method, the author can obtain diverse segmentation results. The proposed method has a speeding up strategy. Experimental results achieve SOTA.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors propose a conditional diffusion model based on Bernoulli noise for medical segmentation which seems suitable for medical segmentation task. The experimental results provided by the authors show that the Bernoulli noise is better than the traditional Gaussian noise for this task.
    2. The theoretical introduction and the description of the algorithm process is correct and sufficient.
    3. The experiments seems to some extent demonstrate the effectiveness of the key contributions of the Bernoulli noise, the loss functions, and the diverse segmentation results.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The contribution of the BerDiff can efficiently sample sub-sequences from the overall trajectory of the reverse diffusion seems not novel enough.
    2. The saliency segmentation mask after the Mean Operation seems not a {0,1} binary mask from the qualitative comparison. It needs to be explained that how to obtain final {0,1} binary segmentation masks which serve as salient regions.
    3. It seems that the author ignores the details about the proposed calibration function.
    4. The qualitative comparison such as Fig2, Fig3, Fig S1 is difficult to understand. It seems the proposed method do not have enough advantage, such as Row1, 2, 4 in Fig 3. It would be better if the author can zoom in some important details and add some descriptions for comparison.
    5. The HM-IoU-21000Iteration-Gaussian result in the Table 2 is 0.0020. Is the result really that bad?
    6. Though the Conclusion Part discusses the limitations and feature works, the main contributions of this paper are not fully summarized.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The algorithm can be reproduced to some extent.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The Strength and the Weakness of this paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes a conditional Bernoulli Diffusion (BerDiff) model for accurate and diverse medical image segmentation. Experimental results show that the proposed BerDiff performs better than other competing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Instead of using the Gaussian noise, this paper first proposes to use the Bernoulli noise as the diffusion kernel to enhance the capacity of the diffusion model for binary segmentation tasks, resulting in more accurate segmentation masks. (2) The proposed BerDiff can efficiently sample sub-sequences from the overall trajectory of the reverse diffusion, thereby speeding up the segmentation process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The paper aims for accurate and diverse medical image segmentation but only investigates methods that produce diverse segmentation maps, not methods that produce accurate ones [1,2,3]. (2) The authors did not compare the proposed BerDiff with other existing methods that can produce accurate segmentation maps [1,2,3]. Current experimental results are insufficient to support that BerDiff can achieve more accurate predictions than other SOTA methods. (3) It seems inappropriate that validate the proposed BerDiff and other competing methods on the BraTS2021 dataset. Each sample of the BraTS dataset only has a single annotation, which makes it difficult to evaluate the diversity of predictions of the model. It would be better to use the QUBIQ dataset or KiTS dataset. (4) Some important details about calculating HM-IoU/Dice are missing. How to generate the ground truth mask to calculate HM-IoU/Dice when each testing sample has multiple annotations? Is the Soft Dice considered as a metric? It is used as the metric for the QUBIQ challenge.   [1] Zhang, et al. “Disentangling Human Error from the Ground Truth in Segmentation of Medical Images.” NeurIPS, 2020. [2] Ji, et al. “Learning Calibrated Medical Image Segmentation via Multi-rater Agreement Modeling.” CVPR, 2021. [3] Liao, et al. “Modeling Annotator Preference and Stochastic Annotation Error for Medical Image Segmentation.” arXiv 2022.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is not provided, but the description of the proposed method is clear enough to re-implement.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) Only one hyperparameter is needed in the loss function (Eq. 10) to achieve the same optimization effect, i.e., setting lambda_BCE equals lambda_BCE/lambda_KL. This reduces the workload of adjusting hyperparameters. (2) The meaning of numbers with bold and underline should be explained in the caption of Table 1/2/3/4. (3) Some typing errors need to be fixed. Such as “the concrete …” in line 2 page 4, which should be “The concrete …”;

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed model sounds novel but the survey and experiments are insufficient.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    This reviewer appreciates the authors’ efforts and the feedback has addressed all my concerns. The authors compared the proposed method to other existing methods that aim to produce accurate segmentation maps, and the proposed method perform better than other competing methods. Moreover, they will evaluate BerDiff on the QUBIQ and KiTS datasets for comprehensive discussions and comparisons in the extended version.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a framework for medical image segmentation based on Bernoulli Diffusion model, where the key idea is to utilize the Bernoulli noise as the diffusion kernel. The reviewers found the method interesting however they raised some concerns and questions that should be addressed in the rebuttal. The authors should refer to relevant studies and compare the differences with these methods and the shortcomings of the previous work to highlight the proposed contribution. Different issues (mentioned by the reviewers) related to the hyperparameters (Eq. 10), qualitative comparisons, quantitative results and metrics used should be clarified. We understand that adding further comparison at this stage is impractical. However, the authors should at least refer to the citations provided by one of the reviewers and discuss them with respect to the proposed approach. Please refer also to the paper’s reproducibility.




Author Feedback

We thank the reviewers for their thorough summaries and valuable feedback. We will address these in the future version and make the source code publicly available.

① Relevant studies (MR R2) The most relevant work to ours is EnsDiff [23], SegDiff [1], and MedsegDiff [24]. They all use the diffusion model to generate segmentation masks. However, they didn’t explore the performance of segmentation of ambiguous images, and all used Gaussian as the diffusion kernel. Our results show that discrete Bernoulli is better than continuous Gaussian for ambiguous and deterministic segmentation tasks.

② Clarification on sub-sequences sampling (R3) BerDiff is compatible with various acceleration strategies in the literature. In our paper, we adopted DDIM’s sampling strategy and proved its feasibility. It can also adopt other more advanced ODE solvers. Therefore, sampling sub-sequences from the overall trajectory is still one merit of BerDiff but not the main contribution of our paper. We will weaken this claim in the future version.

③ Calculation of saliency segmentation mask (R3) As you stated, the saliency segmentation mask after the Mean Operation is not a {0, 1} binary mask. Instead, it is a probability map with values ranging from 0 to 1. We can obtain the final binary segmentation masks by applying a threshold of 0.5.

④ Details about calibration function (R3) Based on Eq. 7, the calibration function aims to calibrate the latent variable in the t-th step, y_t, to a less noisy latent variable in the previous step, y_{t-1}, which consists of two steps: 1) We estimate the segmentation mask y_0 by computing the absolute deviation between y_t and the estimated noise \hat{\epsilon}. 2) We estimate the distribution of y_{t-1} by calculating the Bernoulli posterior, p(y_{t-1}|y_{t},y_0), using \theta{post}. We will make it clear in a future version.

⑤ Qualitative comparison (MR R3) Regarding qualitative results, our BerDiff can generate the saliency segmentation mask to offer the confidence level of different ROIs and produce more accurate results than other methods. Moreover, our method can filter out false positives by low saliency. We will zoom in on the important details in the future version and add more examples.

⑥ Ablation results of diffusion kernel (MR R3) Regarding the HM-IoU-21000Iteration-Gaussian quantitative result, we’d like to highlight Fig. S3, demonstrating that the use of Gaussian noise leads to slower convergence resulting in poor performance at iter21000, whereas our Berdiff achieves faster convergence. This can be attributed to our method’s superior adaptability to the discrete segmentation task.

⑦ Metric calculation & Soft Dice (MR R4) We followed previous ambiguous segmentation methods [13,14,15] to use GED and HM-IoU to measure the agreement between prediction and ground-truth distributions. These two metrics directly calculate agreement using multiple annotated masks, eliminating the need for the ground-truth mask. GED emphasizes diversity, while HM-IoU focuses on accuracy by averaging the IoU scores of matched pairs. Per your advice, the Soft-Dice results of ours and the work you suggested [1,2,3] on LIDC are 64.36/58.70/61.62/59.87, respectively. Our method is also better than [1,2,3]. In the future version, we will add the formal definitions of GED and HM-IoU, Soft-Dice, and discuss these three papers.

⑧ Motivation for using BraTS (MR R4) In addition to ambiguous segmentation tasks, most segmentation scenarios are deterministic. Following other diffusion-based segmentation methods [23,24], we selected BraTS and used Dice to evaluate BerDiff’s accuracy on single annotation tasks. The results show that Berdiff outperforms SOTA methods. In the extended version, we will evaluate BerDiff on the QUBIQ and KiTS datasets for comprehensive discussions and comparisons. Additionally, only one hyperparameter (\lambda_{BCE}) is used.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    the authors addressed the concerns raised. The paper presents a novel method and is of interest to the community.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a computational framework with a new conditional bernoulli diffusion model for medical image segmentation. Extersive experimental results on LIDC-IDRI and BRATS21 show the promising performance of the proposed method comparing with other state-of-the-art approaches. Overall the paper is well written and has sufficient contributions. The authors provided a thorough rebuttal and addressed the reviewers’ concerns regarding further discussions with the related works, ablation analysis on the diffusion kernels, and extensive results under other metrics. These addressed items should be included in the final version.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a novel idea of conditional Bernoulli Diffusion (BerDiff) model for medical image segmentation. Authors have well addressed the major concerns from all reviewers, including the advantages of BerDiff compared to relevant existing studies. The paper has sufficient merit to be accepted.



back to top