Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Yuhao Du, Yuncheng Jiang, Shuangyi Tan, Xusheng Wu, Qi Dou, Zhen Li, Guanbin Li, Xiang Wan

Abstract

Colonoscopy analysis, particularly automatic polyp segmentation and detection, is essential for assisting clinical diagnosis and treatment. However, as medical image annotation is labour- and resource-intensive, the scarcity of annotated data limits the effectiveness and generalization of existing methods. Although recent research has focused on data generation and augmentation to address this issue, the quality of the generated data remains a challenge, which limits the contribution to the performance of subsequent tasks. Inspired by the superiority of diffusion models in fitting data distributions and generating high-quality data, in this paper, we propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks. Specifically, ArSDM utilizes the ground-truth segmentation mask as a prior condition during training and adjusts the diffusion loss for each input according to the polyp/background size ratio. Furthermore, ArSDM incorporates a pre-trained segmentation model to refine the training process by reducing the difference between the ground-truth mask and the prediction mask. Extensive experiments on segmentation and detection tasks demonstrate the generated data by ArSDM could significantly boost the performance of baseline methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_32

SharedIt: https://rdcu.be/dnwyL

Link to the code repository

https://github.com/DuYooho/ArSDM

Link to the dataset(s)

N/A

Reviews

Review #2

Please describe the contribution of the paper

the authors propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit downstream tasks. ArSDM uses the ground-truth segmentation mask as a prior condition during training and adjusts the diffusion loss for each input based on the polyp/background size ratio. Additionally, ArSDM incorporates a pre-trained segmentation model to refine the training process by reducing the difference between the ground-truth mask and the prediction mask. The generated data by ArSDM could significantly boost the performance of baseline methods, as demonstrated by extensive experiments on segmentation and detection tasks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors developed a diffusion model guided by segmentation masks that adaptively reweight the loss components for realistic image generation. Such methodology can be used to counteract the dearth of GI images and can be used to train more accurate polyp segmentation models. They extensively experimented on five polyp segmentation datasets and the results while not stellar are decent
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The only minor concern I have is why in some cases PvT and PraNet+SDM outperform ArSDM. Can the authors suggest a reason for this variation and if yes, it should be included in the discussion section of the manuscript
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method, training setting is well described and the architecture is clear
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The only minor concern I have is why in some cases PvT and PraNet+SDM outperform ArSDM. Can the authors suggest a reason for this variation and if yes, it should be included in the discussion section of the manuscript
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method holds a certain level of technical contribution and its validity is thoroughly established through segmentation and detection experiments on 5 polyp datasets, which I believe meets the standard of MICCAI
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper
- The authors propose a novel diffusion model targeting the semantic generation of polyp images using corresponding annotations as priors.
- Furthermore, to compensate for the size of polyps against the amount of background information, the authors use an adaptive loss to re-weight relevant pixels. This is especially useful to generate realistic images with small polyps.
- In addition the model relies on a prediction-guided mechanisms which refines the quality of the synthetic generated samples by restoring polyp boundary information.
- The model was trained on different standard datasets and the generated samples are used as additional training datasets for segmentation or object detection tasks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The model proposes a novel architecture for generating realistic looking polyp synthetic images by using semantic binary masks as priors.
- Another novelty is the proposed weighted-scheme used on the loss function to account for multi-size polyps by considering (foreground/background )ratio information. With that, the model can generate fine-detailed images containing small polyps. In addition, the model relies on a refinement process to generate heterogeneous samples, which at the end acts as a form of regulariser.
- Results shows that by using the generated images as additional training samples help polyp segmentation and polyp detection models to improve their performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Despite the statistically validated results, it would have been interesting to validate the results by clinicians.
- Limited information on how the model is optimised was given, e.g. what optimiser has been used, learning rate, batch size, etc.
- The reported metrics (AP,F1,Dice,IoU) are in context of improving segmentation/detection results using the synthetic generated samples, but it would have been great to consider additional metrics in terms of data diversity or fidelity, e.g. FID, KID, etc.
- Furthermore, as the baselines are focused segmentation/detection models, it would have been interesting to see how other generative approaches, such as GANs perform on a similar task, e.g. SEAN/SPADE/BORDE (medical imaging)
- Finally, it would have been interesting to see where are the current limitations of the proposed approach and what potential future steps can be used to improve potential limitations.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Based on the details given in the document most of the experiments are reproducible. However, additional information on the hyperparameters/optimisation part would be required (see point #6, i.e. optimiser, lr, etc)
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The paper shows very interesting results on the benefits of using synthetic generated data for improving segmentation/detection tasks. However, it would have been nice to evaluate the data fidelity/heterogeneity of the generated samples by considering additional metrics, such as FID, KID.
- Although, diffusion models seem to perform than other generative approaches, such as GANs, it would have been interesting to compare the quality of the synthetic samples against those generated from GANs, such as general computer vision models (SEAN/SPADE) or more medical specific approaches (BORDE/SinGAN-Seg).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The presented results are convincing and part of an apparent well-defined project. In addition, the model relies on novel generative networks, such as diffusion-based approaches which are on par with the SOTA. In addition, the document is well written and the proposed architecture is innovative.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors proposed a new colonoscopy image synthesis method called Adaptive Refinement Semantic Diffusion Model (ArSDM), including a.) adaptive loss re-weighting and b.) prediction-guided sample refinement mechanisms to maintain the consistency between the polyp morphologies in synthesized images and the original mask. Experiments show that the proposed method poses a good ability to generate authentic colonoscopy images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed method demonstrates a good ability to generate authentic colonoscopy images.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The authors should conduct the downstream experiments using only the images generated from the proposed method instead of combining them with the real data. This makes the training set twice larger compared to the baselines with only real data.
- The reviewer would like to how to generate images with no/arbitrary masks as mentioned in Sec. 1, making the whole pipeline label-free for curating new datasets. Otherwise, the proposed method is still limited by the number of masks. Please present the downstream application result and image authenticity evaluation result along with the response.
- The authors mentioned that “we sought evaluations from medical professionals to assess the authenticity of the generated samples” but the results are missing. Please provide some reader studies and metrics such as FID and IS to support the authenticity of the proposed method.
- Could the authors provide some visualization results in the ablation study to support its efficacy instead of presenting downstream application performance that cannot directly measure the image’s authenticity? The reviewer especially wants to know what the background-like polyps look like generated by the baseline method and how well the proposed method solves this problem.
- The ideas in the abstract and introduction may sound technical to an extent. However, some segmentation ideas [*1, *2, *3] are also popular in the polyp segmentation field and should be included in related work.
[1] Precise Yet Efficient Semantic Calibration and Refinement in ConvNets for Real-time Polyp Segmentation from Colonoscopy Videos [2] Video Polyp Segmentation: A Deep Learning Perspective [*3] Colonformer: An efficient transformer based method for colon polyp segmentation
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Detailed parameters are shown in the manuscript. I think reproducibility is fine.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The explanation and reference of LDM and SDM are missing.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method in the paper is convincing but the experiment is not complete enough.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposes the Adaptive Refinement Semantic Diffusion Model (ArSDM) for generating realistic colonoscopy images. It uses semantic binary masks as priors and adaptive loss to handle different polyp sizes. The model incorporates a prediction-guided mechanism to refine the generated samples. The generated images improve the performance of polyp segmentation and detection models. Reviewers praised the novel architecture and the use of diffusion models. However, they noted the need for clinician validation, more information on optimization, additional evaluation metrics, comparison with other generative approaches, and exploration of limitations and future improvements. Overall, the paper received positive ratings and is considered a strong accept with minor weaknesses.

Author Feedback

N/A

back to top

ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models