Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mengxue Sun, Wenhui Huang, Yuanjie Zheng

Abstract

In pathological image analysis, determination of gland morphology in histology images of the colon is essential to determine the grade of colon cancer. However, manual segmentation of glands is extremely challenging and there is a need to develop automatic methods for segmenting gland instances. Recently, due to the powerful noise-to-image denoising pipeline, the diffusion model has become one of the hot spots in computer vision research and has been explored in the field of image segmentation. In this paper, we propose an instance segmentation method based on the diffusion model that can perform automatic gland instance segmentation. Firstly, we model the instance segmentation process for colon histology images as a denoising process based on the diffusion model. Secondly, to recover details lost during denoising, we use Instance Aware Filters and multi-scale Mask Branch to construct global mask instead of predicting only local masks. Thirdly, to improve the distinction between the object and the background, we apply Conditional Encoding to enhance the intermediate features with the original image encoding. To objectively validate the proposed method, we compared state-of-the-art deep learning model on the 2015 MICCAI Gland Segmentation challenge (GlaS) dataset and the Colorectal Adenocarcinoma Gland (CRAG) dataset. The experimental results show that our method improves the accuracy of segmentation and proves the efficacy of the method.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_64

SharedIt: https://rdcu.be/dnwKk

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This work introduces a deep learning model to segment epithelial glands in colon histopathology images. The proposed model combines an “image encoder” in the form of a feature pyramid network for feature extraction from 2D histopathology images, a diffusion model to generate object masks from candidate bounding boxes, and a mask branch to aggregate the multiscale features from the image encoder. The authors evaluate the model’s performance on two public datasets and show incremental performance gains over existing segmentation models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This works addresses a relevant problem in the field of digital pathology. The authors’ used public datasets to evaluate their model and provide comparisons against published works.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This work combines multiple existing model architectures (FPN, diffusion models) with a large body of related works demonstrating how they can be used for different image segmentation tasks in computer vision and medical imaging. The paper lacks clarity and the experiments presented are not sufficient to help readers gain insight into the novelty in the proposed model.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Lacks important technical details pertaining to the training of the model such as hyperparameters, optimization strategy, etc.
    • Lacks clarity on how the method is employed at inference time
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) Motivation/clinical relevance:

    The motivation behind this work is unclear. The authors’ in the introduction state that existing works for gland segmentation “may encounter challenges in capturing cell locations with varying shapes and distinguishing gland boundaries that are in close proximity”. How does this impact these models’ performance and more importantly, how does it impact the downstream clinical task? In the experiments’ section, the performance of the different methods being benchmarked seems relatively high (~0.8 Dice) and given that the datasets collected all rely on manual ground truth annotations which will always be noisy, can the authors expand on why existing models’ performance needs to be improved and better qualify the type of errors that need to be corrected? How does the proposed method attempts to correct for these errors?

    2) Method:

    The paper would greatly benefit from clarifying the proposed method.

    • How is the model being trained? Are all branches/parts of the architecture trained end-to-end or was there any sequential training process employed?
    • What is the final loss function optimized during training?
    • Can the authors expand on the design choices for the “image decoder”: why did the authors choose to randomly sample boxes from the input image? How are the ground truth masks being used in the diffusion process?
    • How is the concatenation of F_r with the denoising model’s encoder being done? What is the dimensionality of the output of the “image encoder”?
    • The concatenation of F_mask with the decoded output from the diffusion model is unclear: can the authors provide the dimensions of the different tensors being concatenated? What is used as input to the mask branch and what is the output size?
    • How is the model being trained? What is the contribution of the diffusion optimization vs the instance segmentation task?

    3) Experiments:

    • Can the authors add the standard deviation for all metrics reported in tables 1 and 2?
    • How statistically significant are the differences in performance between all models?
    • How are the ablation experiments conducted? The authors write “when employing mask branch…”: does this mean mask branch is used alone or in combination with all other parts of the model? How is the model being trained without the mask branch that concatenates different intermediate outputs?
    • As is, it is very difficult for the reader to understand the added benefit of each component of the model. How does the diffusion model help in improving performance and which metric is improved? How significant are the differences in metrics when adding/removing each component?
    • Can the authors provide examples of cases where SOTA models do not perform well and the type of errors this method aims to correct/improve on?

    4) Minor details

    • The writing in the paper can be improved, there are many typos throughout the paper: e.g., spelling errors “boxs”
    • Figure 1 could be clarified by adding input/output shapes and clarifying the legend
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As presented, the method is very unclear and lacks novelty as well as clinical relevance. It is difficult to understand what drives the performance as the experimental results are relatively slim. See detailed feedback for additional comments.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    2

  • [Post rebuttal] Please justify your decision

    The rebuttal pointed to a lot of technical details added to the authors’ git repo which isn’t available, making it difficult to get answers to the different questions/concerns raised.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a diffusion model-based method for gland instance segmentation, which treats instance segmentation as a denoising process based on the diffusion model. The proposed model consists of three main parts: Image Encoder, Image Decoder, and Mask Branch, and incorporates conditional encoding for denoising and multi-scale information fusion for accurate segmentation. Experimental results on the GlaS dataset and CRAG dataset demonstrate the effectiveness of the proposed method, as it outperforms several state-of-the-art approaches in gland instance segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel formulation: The proposed method introduces a diffusion model-based approach for gland instance segmentation, treating segmentation as a denoising process. This novel formulation allows for improved precision in instance localization and compensation for missing details in the diffusion model.

    2. Conditional Encoding: The paper employs conditional encoding for denoising, providing a more accurate instance context for the noise filtering process in discriminative tasks. This enhances the performance and segmentation quality of the proposed method.

    3. Multi-scale Information Fusion: The method incorporates multi-scale information fusion through the use of a Feature Pyramid Network (FPN) for accurate and robust segmentation results, addressing the challenges posed by complex gland morphology.

    4. Strong Evaluation: The proposed method is thoroughly evaluated on two datasets, the GlaS Challenge dataset and the CRAG dataset. The experimental results demonstrate that the method outperforms several state-of-the-art approaches in gland instance segmentation, showcasing its effectiveness and potential for clinical applications.

    5. Ablation Study: The paper conducts ablation studies to validate the efficacy of the Mask Branch and Conditional Encoding modules, providing a clear understanding of the contributions of each module to the overall performance of the model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Reliance on Bounding Boxes: The denoising process in the proposed method relies on bounding boxes, which may limit the flexibility of the approach. The paper mentions the intention to reduce reliance on bounding boxes in future work, but this limitation is not addressed in the current study.

    2. Inference Speed: The diffusion model requires multi-step denoising, which makes inference slow and time-consuming. Although the paper acknowledges this limitation and suggests investigating more efficient methods for cross-step denoising, the current work does not provide a solution to this issue.

    3. Lack of Detailed Comparison: While the paper compares the proposed method with several state-of-the-art approaches, it does not provide a detailed discussion of the differences between the proposed method and existing methods. A more comprehensive comparison would have strengthened the paper.

    4. Limited Generalizability: The evaluation of the proposed method is focused on gland instance segmentation in colon histology images. It is unclear how well the method would generalize to other types of medical images or segmentation tasks.

    5. No Discussion of Failure Cases: The paper does not discuss any failure cases or limitations of the proposed method in specific scenarios. An analysis of failure cases could provide insights into areas for improvement.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient details about the methodology, datasets, implementation, and evaluation metrics to facilitate reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The paper mentions the reliance on bounding boxes for the denoising process. It would be helpful to elaborate on the limitations of this reliance and discuss potential approaches to reduce the dependence on bounding boxes in future work.

    2. The inference speed of the diffusion model is mentioned as a limitation due to the multi-step denoising process. The authors could consider discussing potential solutions or optimizations to address this limitation and improve the efficiency of the model.

    3. A more detailed comparison with existing methods would strengthen the paper. Specifically, it would be valuable to discuss the key differences between the proposed method and state-of-the-art approaches and explain why the proposed method outperforms them.

    4. An analysis of failure cases or scenarios where the proposed method may struggle would provide insights into areas for improvement and potential future research directions.

    5. If possible, the authors are encouraged to make the code and pre-trained models publicly available to enhance reproducibility and facilitate further research by the community.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel diffusion model-based approach for gland instance segmentation and introduces conditional encoding and multi-scale information fusion to improve accuracy. The method is thoroughly evaluated on two datasets and demonstrates effectiveness compared to state-of-the-art approaches. Limitations include reliance on bounding boxes, inference speed, and generalizability concerns. Overall, the paper makes valuable contributions to the field, but there is room for further improvement and exploration.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper proposes a method for gland instance segmentation based on a diffusion model. The instance segmentation is achieved by treating it as a denoising process using the diffusion model. The proposed method incorporates multi-scale information fusion, resulting in superior performance compared to existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method for gland instance segmentation based on a diffusion model is a novel approach that utilizes the inherent characteristics of gland structures in pathology images. By treating instance segmentation as a denoising process, the method is able to effectively capture and highlight the gland structures in the image.

    Incorporating multi-scale information fusion further enhances the performance of the proposed method. The ability to utilize information at different scales allows for a more comprehensive and accurate representation of the gland structures, leading to improved segmentation results.

    Overall, the proposed method shows significant strength in its ability to accurately segment gland structures in pathology images, with superior performance compared to existing methods. This has important implications for improving the accuracy and efficiency of diagnosis and treatment planning in pathology.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While diffusion models have similar functionality to Generative Adversarial Networks (GANs), the paper could benefit from clarifying the specific advantages of using diffusion models for instance segmentation over GANs. This would help to differentiate the proposed approach from existing methods that utilize GANs.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Author claim the code will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    By emphasizing the advantages of diffusion model, the paper could strengthen the justification for using diffusion models for gland instance segmentation, and better position the proposed approach in the context of existing methods that use GANs.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an advanced approach to pathology image analysis by employing a diffusion model for classification tasks. The proposed method is able to effectively capture the underlying structure and information in pathology images, resulting in superior performance compared to existing methods. The results and visualizations presented in the paper are comprehensive and provide strong evidence for the effectiveness of the proposed approach. However, to increase the chances of acceptance at a conference such as MICCAI, the paper could benefit from providing more details of the diffusion model training process. This would help to demonstrate the technical rigor of the proposed method and enhance the reproducibility of the results. Additionally, the paper could also include a more thorough discussion of the limitations of the proposed approach and potential avenues for future research. This would help to position the proposed method in the context of existing work and identify area.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The problem proposed is addressed.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    We have received diverse reviewer comments. Two reviewers suggest a decision of accept while one reviewer recommends strong reject. While reviewers confirmed the merits of the paper on using diffusion model for gland segmentation and improved performance, they also raised major concerns including justification of methodological novelty, lack of enough clarity and extensive experimental analysis and comparison, etc. Therefore, a decision of Invite for Rebuttal is recommended for the authors to address the reviewers’ comments.




Author Feedback

Dear Reviewers and Meta-Reviewer, We appreciate your valuable feedback. In general, Reviewer 1 has provided suggestions regarding the training and model details in our work, while Reviewer 3 and 4 have positively recognized the innovation and completeness of experiments. Due to space limitations, we have summarized your comments into several key aspects and provided feedback that addresses all of the concerns. We encourage the reviewers to refer to our GitHub repository: IADM, which includes further details.

Reviewer #1:

  1. Lack of model training details: Our image encoder is pretrained using ImageNet-1K and 21K. Subsequently, the proposed network is trained end-to-end on COCO and LVIS datasets, and fine-tuned on GlaS and CRAG datasets. Detailed documentation of the loss function, hyperparameters and optimization strategies is included in our GitHub repository.
  2. Design choices are unclear, such as the random sampling of boxes from the input image: To address this concern, we have clarified our design choices. We randomly sample boxes from the input image to align with the reverse generation process of the diffusion model. Additionally, the ground truth serves two roles, generating noise boxes during training and calculating the loss. We have updated Fig. 1 to provide more detailed information, including input/output dimensions and the combination method of each component (e.g., F_r and F_mask). Further details such as the inference process shown in Fig. 1 is in the repository.
  3. Motivation is unclear: We clarify that we consider query-based approaches for medical image segmentation as a special case of diffusion models. However, the diffusion model offers the advantage of multi-step stepwise denoising during inference, enabling more refined instance perception. Our motivation lies in utilizing the diffusion model for accurate instance partitioning of glandular structures, which exhibit diverse morphologies and require precise instance perception.
  4. Experimental details can be improved: In response, we have made the following improvements. Details are updated in the repository: 1) Standard deviations and p-values we can obtain are added to the experimental results. 2) We have further analyzed the ablation study. The comparison of the mask branch demonstrates the improvement achieved through multi-scale feature extraction, and the comparison of conditional encoding proves the effectiveness of integrating image features with the intermediate layers of the diffusion model during the diffusion process. 3) We have visualized comparative experiments showcasing suboptimal performance of SOTA models, such as ineffective boundary identification and over-/under-segmentation.

Reviewer #3:

  1. Lack of detailed comparisons: We have updated our GitHub repository to include visualizations that highlight the advantages of our method over existing approaches, which illustrates the enhanced performance achieved by our diffusion model.
  2. Generalizability: We have added experimental results on the RINGS prostate dataset (see GitHub repository), showcasing the applicability of our method beyond colon histology images.
  3. Failure cases and limitations: We have observed challenges in the kernel segmentation task, where our network tends to group multiple small targets with unclear boundaries into a single object. This limitation indicates compromised segmentation accuracy when dealing with significant aggregation or overlap of gland instances.

Reviewer #4: Diffusion Models vs GANs: In general tasks, diffusion models have been proven to be more stable during training and capable of generating more detailed results. Moreover, in our specific task of medical image instance segmentation, GANs perform single-step generation and therefore struggle to generate in complex distributions in one step. The diffusion model, on the other hand, offers the advantage of multi-step stepwise denoising during inference, enabling more refined instance perception.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Considering all the comments from reviewers and rebuttal from authors, the authors addressed the reviewers’ concerns partially. Although the major concerns regarding justification of method design and lack of clear clarity raised by R1 still exist. The exploration of diffusion model in MIA could be worthy of discussion. Overall, I feel the merits are over the weakness.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposed a gland instance segmentation by treating instance segmentation as a denoising process based on the diffusion model. Furthermore, the proposed method incorporates multi-scale information fusion via a feature pyramid network. The method is evaluated on a public GlaS challenge and achieve better results than other methods. Two reviewers suggest clear paper acceptance whilst one reviewer is against it. Yet in my opinion, author rebuttal clarify well on the methodological details. The only remaining issue might be the speed during inference phase (but there are existing approach to accelerate diffusion model execution and can apply here as well). In general, I like the novel approach of the paper and its performance in Challenging dataset and strongly suggest to accept the paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The proposed approach has the goal of segmenting epithelial glands in colon histopathology images based on a diffusion model Strengths include the application, which is seen as highly relevant, the evaluation on public datasets, the general architecture and the ablation study the authors conduced. In the initial review, the reviewers criticize that the innovation of the approach did not become fully clear with R2 mentioning that the proposed method is rather a combination of existing methods.

    Unfortunately, the authors added most results in their rebuttal via an external link, which has been removed as per MICCAI policy. External resources during rebuttal are not allowed and this considerably limits the information contained in the review. Given the current state of the paper, I therefore rather align with reviewer #1, as the details of the proposed method were not fully clear. From the figure, it seems that the diffusion happens only (?) for the bounding boxes, rather then the segmentation, with additional details in the description missing. A discussion on the biomedical relevance of the slightly increased performance given the potentially considerable increase in runtime is further not included in the paper and not part of the rebuttal.

    Taken together, the paper, though interesting, in the current form is therefore from my perspective below the acceptance threshold.



back to top