Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xinrong Hu, Yu-Jen Chen, Tsung-Yi Ho, Yiyu Shi

Abstract

Recent advances in denoising diffusion probalistic models have shown great success in image synthesis tasks. While there are already works exploring the potential of this powerful tool in image semantic segmentation, its application in weakly supervised semantic segmentation (WSSS) remains relatively under-explored.
Observing that conditional diffusion models (CDM) is capable of generating images subject to specific distributions, in this work, we utilize category-aware semantic information underlied in CDM to get the prediction mask of the target object with only image-level annotations. More specifically, we locate the desired class by approximating the derivative of the output of CDM w.r.t the input condition. Our method is different from previous diffusion model methods with guidance from an external classifier, which accumulates noises in the background during the reconstruction process. Our method outperforms state-of-the-art CAM and diffusion model methods on two public medical image segmentation datasets, which demonstrates that CDM is a promising tool in WSSS. Also, experiment shows our method is more time-efficient than existing diffusion model methods, making it practical for wider applications.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_72

SharedIt: https://rdcu.be/dnwEp

Link to the code repository

https://github.com/xhu248/cond_ddpm_wsss

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    The authors proposed a weakly supervised semantic segmentation framework using conditional diffusion models. Compared to other diffusion model-based methods, the proposed method has faster inference time while having superior performance. The proposed method also outperform other CAM-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors proposed a weakly supervised semantic segmentation framework using conditional diffusion models which is fairly novel.

    2. The proposed diffusion model-based method has relatively fast inference time compared with other diffusion model-based method.

    3. The proposed method outperforms state-of-the-art diffusion model-based and CAM-based methods on 2 different public dataset.

    4. calculating the difference between between x_{t-1} and x_{t-1}’ is an interesting and novel idea.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In equation (1) and (2) in the forward process, it is not clear why and how the Gaussian distribution is conditioned on y which is the image-wise label. Does it follow different Gaussian distribution for different class label y? If that is the case, it would be better if the authors can explain why that would be a good idea.

    2. It would be beneficial if the authors can expand on why “given the same images at noise level Q, but with different conditions, the noises predicted by the network are supposed to reflect the localization of target objects”. Is that true for all Q? If not, is there a theoretical way to determine Q?

    3. In Fig.2, is Guided CG-Diff and is CDM+Guided CG-CDM? It would be less confusing if the naming and notation is more consistent.

    4. In the ablation study, it seems that the hyperparameters Q and tau (especially Q) are pretty sensitive. It would be better to conduct the same ablation study on the other dataset and see if the optimal hyperparameters are similar across different dataset. If that is not the case, then there are going to be some questions, since if the performance depends this much on the hyperparameters and the hyperparameters are sensitive, the method can barely be used as it would be a nightmare to tune.

    5. As for the classifiers used in the experiments, how accurate are they? Also it is not clear that how the accuracy of the classifier will affect the performance of the CDM-based WSSS framework.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code will be made public upon acceptance and the method does not seem difficult to implement. The datasets used in the paper is public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. If would be better if the authors can revise equation (1) and (2) while explaining the statement “given the same images at noise level Q, but with different conditions, the noises predicted by the network are supposed to reflect the localization of target objects” in more details.

    2. It would make the paper easier to read if the notations are more consistent.

    3. More experiments on how the accuracy of the classifier affect the model performance and how the hyperparameters can be more easily selected would be desirable.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the paper has some issues with some notations and details, and it needs some more experiments to solidify the claims, the idea of the paper is interesting and novel. It takes advantage of the power of diffusion models in generating images while minimizing diffusion models’ disadvantage of long inference time in a weakly supervised semantic segmentation frame work.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present a novel method based on the combination of the powerful capabilities of diffusion models and the use of image-level annotations to perform semantic segmentation of anatomical structures and neoplasms on MRI slices. They use a weakly supervised semantic segmentation framework and enhance the state of the world methods. They present ablation studies as well as time efficiency studies. Also give a comparison with state of the art existent methods tested on two datasets. As a result authors present a method with superior performance regarding both segmentation accuracy and inference efficiency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed a novel method to perform semantic segmentation of medical images. They were inspired by the fact that during the first steps of the reverse diffusion process, the diffusion model encodes semantic information of the image that could be used to perform segmentation of the structures of interest. They decide to explore a weakly supervised approach by using image-level labels to guide the diffusion process. For this, they develop a strategy to incorporate the knowledge provided by the labels of the images into the U-NET model. These ideas are strengthened by using an external classifier to guide the diffusion process. The resulting method overcomes existing weakly supervised semantic segmentation approaches based on diffusion models and on class activation models. Finally, the authors provide a detailed analysis on the effect of the different hyper parameters to justify their selection.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper centers around exploring the use of conditional diffusion models in a weakly supervised semantic segmentation task. They demonstrate their method outperforms the two main existing approaches (CAM and DM) and their variants on two public medical image datasets. However, the results obtained with a weakly supervised method are still way behind the fully supervised approach, the authors themselves reported this in one of the tables. A reference is needed that points to the source of the results obtained through the FSL approach. It is worth seeking a WSSS method that could match or approximate a fully supervised setting considering the well known advantages the WS approach offers, like the cost reduction of pixel-wise labeling for segmentation tasks.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The results presented by the paper could be 100% reproducible by the community if the code is provided, given that all the experiments were done using public medical image segmentation dataset. The computational resources used to train the model and perform inference are fully within the reach of the vast majority of groups conducting research in deep learning applied to medical imaging.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is in general well written. It clearly states their contribution, the problem involved, its objectives, the methods involved, the experiments performed, datasets employed, computational resources, analysis of the results and conclusions. There are minor grammar errors in some paragraphs that must be addressed.

    Overall, it is a well structured paper that shows an improvement over existing methods that employ a weakly supervised approach for semantic segmentation. However, it is necessary to further explore the limits of a weakly supervised approach in order to know if it is possible to achieve a performance similar to the results obtained with a fully supervised approach.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper shows an improvement over existing methods that employ a weakly supervised approach for the problem of semantic segmentation in medical images. The authors achieve this by combining a diffusion model, infusing the knowledge of image-level labels into the U-NET and an external classifier as guidance for the diffusion model. They performed their experiments on two challenging segmentation tasks, kidney segmentation and brain tumor segmentation. On both, they achieved better results in comparison with existing weakly supervised methods. Their approach is also time efficient when performing inference in comparison with the existing methods. The authors conducted an extensive analysis of the several hyperparameters to achieve their results. The authors’ final results are summarized in a table that shows their performance over different metrics. The conclusions are clear and are strongly supported by the experimental results.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    Proposing a novel framework for weakly supervised semantic segmentation (WSSS) using conditional diffusion models (CDM) that leverages the category-aware semantic information underlying in the CDM. This framework is different from previous diffusion model methods that rely on external classifiers and accumulate noises in the background during the reconstruction process. Developing a method that calculates the derivative of the predicted noise after a few stages with respect to conditions using the finite difference method. This method highlights the related objects in the gradient map with less background misidentification and does not require the full reverse denoising process for the noised images, making it more time-efficient than existing diffusion model methods. Demonstrating the effectiveness of the proposed framework on two public medical image segmentation datasets for brain tumor segmentation and kidney segmentation, achieving state-of-the-art performance and outperforming.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Proposes a method that calculates the derivative of predicted noise after a few stages with respect to conditions instead of completely removing the noise from images. Adopts the finite difference method to perturb the condition embedding by a small amplitude and log the change of output with DDIM generative process as the output of diffusion model is not differentiable with respect to the discrete condition input. The paper is more comprehensively structured, conducts adequate experiments, and compares it with many of the current state-of-the-art methods to achieve the best score.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The model may lack some generalizability, for example, the method has only been experimentally validated on U-Net and not on other classical models.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is reproducible. A detailed description of the algorithm as well as the experimental configuration is available in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The method seems to be validated only on U-Net, if possible, it is better to validate the proposed method on some other classical models such as U-Net+Attention, which can highlight the effectiveness and universality of the proposed method more. It is excellent to describe the proposed method using a detailed algorithm, and if you want to make the proposed method easier to understand, you can add some figures to illustrate it appropriately.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Introduces a novel WSSS (Weakly Supervised Semantic Segmentation) framework with conditional diffusion models (CDM) which highlights related objects in the gradient map with less background misidentified. A full comparison with the existing state-of-the-art models is made and the best score is achieved, and there are ablation experiments to further demonstrate the effectiveness of the method. A detailed algorithm description of the proposed method is available for easy understanding.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a weakly supervised semantic segmentation framework for medical images using conditional diffusion models. The reviewers noted the novel methodological aspects, the fact that the method is semi-supervised and the detailed analysis. They however raised some important questions and issues and made some suggestions that should be addressed in the camera ready version of the paper. In particular eqs. 1-2 should be revised, the notation should be made more consistent, If possible, some more experiments using models different than the UNet should be added.




Author Feedback

N/A



back to top