Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Aman Shrivastava, P. Thomas Fletcher

Abstract

In recent years, computational pathology has seen tremendous progress driven by deep learning methods in segmentation and classification tasks aiding prognostic and diagnostic settings. Nuclei segmentation, for instance, is an important task for diagnosing different cancers. However, training deep learning models for nuclei segmentation requires large amounts of annotated data, which is expensive to collect and label. This necessitates explorations into generative modeling of histopathological images. In this work, we use recent advances in conditional diffusion modeling to formulate a first-of-its-kind nuclei-aware semantic tissue generation framework (NASDM) which can synthesize realistic tissue samples given a semantic instance mask of up to six different nuclei types, enabling pixel-perfect nuclei localization in generated samples. These synthetic images are useful in applications in pathology pedagogy, validation of models, and supplementation of existing nuclei segmentation datasets. We demonstrate that NASDM is able to synthesize high-quality histopathology images of the colon with superior quality and semantic controllability over existing generative methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_76

SharedIt: https://rdcu.be/dnwKz

Link to the code repository

https://github.com/4m4n5/NASDM

Link to the dataset(s)

https://warwick.ac.uk/fac/cross_fac/tia/data/lizard/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a generative model based on Noise Diffusion to produce synthetic histopathology tissue samples that are consistent with the conditioning nuclei masks and can be used for training purposes.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors make use of a very recent technology in the field of AI, which is not usually the case in medical imaging, where application of novel technologies tend to be delayed with respect to other fields.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One of the main claims of the authors to propose this method is that it can be used for improving the performance of tasks like nuclei segmentation. However, this point was never evaluated by the authors, and the only quantitative metric is similarity to real data. An additional weakness is that the evaluation does not compare how other methods perform for this metric on the same dataset, nor how the proposed method performs on prior art. Therefore, there is no fair comparison with the state of the art.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is somewhat reproducible, since the authors use a public dataset and they provide enough details of the implementation and promise to make the code public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper can be improved in the following ways:

    • extend experiments to other datasets so that fair comparisons can be made with the state of the art. This will highlight the benefits of using a semantic mask as conditioning signal, or using diffusion models instead of GANs.
    • in addition to evaluating performance in terms of inception distance and inception score, that merely compares how similar the synthetic distribution is to real data, I would propose that the authors evaluate on any given downstream task (e.g., nuclei segmentation) so that one can understand the value of the proposed method, since it may happen that a model that produces a data distribution slightly worse provides a better training signal than a model that produces a very realistic data distribution that does not add anything new to the training signal.
    • I would suggest to remove the stain normalization from the pipeline, enabling the model to produce data with more variability, which is to be expected in clinical practice.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes to address a problem in medical imaging: data availability. They do that through a diffusion model that can generate realistic histology images with a nuclei-mask conditioning signal. The paper is technically sound, but the evaluation is not adequate. Fair comparison to other methods and effect on the downstream tasks are absolutely needed.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    My main concerns were on the evaluation section of the paper: comparison with state of the art methods and impact on downstream tasks. Both concerns have been addressed in the rebuttal and I am happy to update my review accordingly.



Review #2

  • Please describe the contribution of the paper

    The authors leverage recent advances in generative models (specifically, in diffusion models) in conjunction with annotated GI himages to train a model for producing synthetic histology images. In this formulation, nuclei masks and class labels are provided to a denoising diffusion model which in turn produces high-quality pathology images with the semantic characteristics of the input information. The authors then show this approach produces results superior to other proposed methods in the literature.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is very clearly written, and serves as a helpful introduction to diffusion models for users in the medical image field. I appreciate their introduction of their approach in detail.
    • Their method selection is well-suited to their intended use case. Providing semantic information in the form of nucleus masks is novel and clever, and due to semantic nucleus masks being common in the literature, their method is broadly applicable.
    • The evaluation methods and comparison to literature clearly and quantitatively shows the superiority of their approach.
    • The authors perform two useful ablations that help in providing better understanding for the audience.
    • The methods used are closer to computer vision state of the art than other approaches in the literature e.g. GANs.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Ablation over the objective magnification has limited interpretability because, as the authors mention, the amount of training data is reduced. A more fair comparison would involve holding the amount of training data constant (even if it yields worse performance in the 20x use case)–as-it, it offers limited additional understanding.
    • The authors limit their tests to conditioning via semantic masks that already exist in their test dataset. Because their semantic masks provide (in my opinion) much more information than the methods they compare to, it is difficult to determine to what extent the model is able to generate arbitrary biologically meaningful images. It would be a stronger example to show that a synthetic (but feasible) semantic nucleus mask could be used to generate realistic images.
    • Similarly, the synthetic images derived using this method are highly similar to the images used to create the semantic masks. This suggests that the generative model may have limited capacity to introduce biologically realistic variation provided semantic masks. This could limit utility in generating e.g. synthetic images for training instance segmentation models. Nevertheless, this result potentially simplifies future generative tasks to solely generating semantic masks.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Sufficient detail to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please see “weaknesses” section. Specifically, it would be helpful to show that the method works for semantic guidance not contained in the test dataset, and perform the magnification ablation using the same amount of training data.

    Additionally, it would be helpful to have a bootstrapped CI on the patch-wise IS and FID rather than simply the mean value. Finally, 20x typically implies 0.5 microns per pixel, but this also depends on the instrument used to capture the image. Specifying microns per pixel values will help clarify this.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is novel and the method described by the authors has the broad potential to be useful in digital pathology applications. It is furthermore clearly written.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors adjustments are good. It would be nice to see the method used in downstream tasks but even as a standalone result it is strong. I stand by my original accept (even upon reading the critique by other reviewers).



Review #3

  • Please describe the contribution of the paper

    This paper proposed a nuclei-aware semantic tissue generation framework (NASDM) using conditional diffusion modeling. This work can help generate synthetic realistic tissue samples for nuclei segmentation model validation or education purposes for rare diseases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I think this is a novel application of the conditional diffusion model to generate tissue images using semantic masks. There are attempts to generate synthetic tissue images using GAN models or using similar DDPMs but with different conditions, such as genotype. This paper creatively leverages the semantic nuclei segmentation masks with DDPMs as the condition for training and also explore the classifier-free guidance during training to increase the flexibility of the data generation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper did not include the discussion of using imperfect semantic segmentation masks for data generation. for example, when you provide a nuclei segmentation mask, which only have part of nuclei segmented, what the synthetic image looks like? This is an important aspect to explore, because it is exhaustive to generate very fine nuclei segmentation. In addition, if you provide a very poor condition (the segmentation mask), will the model fail?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper proposed a nuclei-aware semantic tissue generation framework (NASDM) using conditional diffusion modeling. This work can help generate synthetic realistic tissue samples for nuclei segmentation model validation or education purposes for rare diseases. I think this is a novel application of the conditional diffusion model to generate tissue images using semantic masks. However, there are some questions need to be addressed. 1. There is no explanation of the rationale of adding edges of nuclei segmentations for the model training. If there is no edge added will the model performance decrease significantly? If so, what’s the reason? 2. since you trained the model both conditionally and unconditionally, if you provide partially labelled masks, how will the model perform? Whats the tolerance of your model for the noisy/imperfect semantic masks? 3. how about evaluate the same nuclei segmentation network’s performance on the real data and the synthetic data. will it produce large difference? How will the scale ablation affect the performance? Minor: in section 3.1, last sentence, it should be 10x not 20x.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. It is a novel application of conditional diffusion model.
    2. It use the public dataset and promised to make the code public, so the reproducibility is great.
    3. It also evaluate how much the condition (semantic mask) can help with the quality of synthesized data using classifier-free guidance It also provide both quantitative and qualitative evaluations. 
    
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Summary of the Key Strengths and Weaknesses of the Paper:

    Key Strengths:

    • The paper introduces a novel application of the conditional diffusion model, using semantic masks to generate tissue images.
    • The writing is clear and serves as a useful introduction to diffusion models for users in the medical image field.
    • The method selection, particularly the use of nucleus masks as semantic information, is clever and widely applicable due to the prevalence of such masks in the literature.
    • The authors perform helpful ablations that contribute to a better understanding of the proposed approach.
    • The methods employed align with the state-of-the-art in computer vision, offering advantages over other existing approaches like GANs.
    • The paper demonstrates creative use of semantic nucleus segmentation masks with diffusion models, and explores classifier-free guidance during training to enhance data generation flexibility.

    Areas of weakness in the paper that require further attention:

    • The claim that the proposed method can improve performance in tasks like nuclei segmentation lacks evaluation, as the only quantitative metric used is the similarity to real data.
    • A fair comparison with the state-of-the-art methods is absent.
    • The ablation conducted on objective magnification lacks interpretability since it reduces the amount of training data. A fairer comparison would involve holding the training data constant, even if it results in lower performance for the 20x use case.
    • The tests conducted only use semantic masks existing in the test dataset, making it challenging to determine the extent to which the model can generate biologically meaningful images arbitrarily. It would be stronger to demonstrate the ability to generate realistic images using synthetic semantic nucleus masks.
    • The synthetic images generated using this method closely resemble the images used to create the semantic masks, suggesting limited capacity to introduce biologically realistic variations. This could restrict its utility in generating synthetic images for training instance segmentation models.
    • The results potentially simplify future generative tasks to generating semantic masks alone.
    • The paper does not discuss the use of imperfect semantic segmentation masks for data generation, which is an important aspect to explore.

    To assist the authors in enhancing this study, we offer the following recommendations:

    • Extend experiments to other datasets to enable fair comparisons with state-of-the-art methods, emphasising the benefits of using semantic masks as conditioning signals or diffusion models instead of GANs.
    • Evaluate the proposed method not only based on similarity metrics but also on downstream tasks to demonstrate its value.
    • Consider removing stain normalisation from the pipeline to allow for more variability in the generated data, aligning it with clinical practice expectations.
    • Show that the method can work with semantic guidance beyond the test dataset and perform the magnification ablation using the same amount of training data.
    • Provide bootstrapped confidence intervals on patch-wise Inception Score (IS) and Fréchet Inception Distance (FID) instead of just mean values.
    • Specify the microns per pixel values to clarify the scale of the images, considering that 20x typically implies 0.5 microns per pixel but can vary depending on the capturing instrument.
    • Explain the rationale behind adding edges of nuclei segmentations during model training.
    • Investigate the model’s tolerance for noisy/imperfect semantic masks.
    • Evaluate the performance of the nuclei segmentation network on both real and synthetic data to assess potential differences.
    • Discuss the potential impact of the scale ablation on performance.

    Key remarks the authors should focus on in their responses, in the rebuttal phase:

    • The evaluation lacks comparison with other methods on the same dataset or with prior art, thus lacking a fair comparison with the state-of-the-art.
    • The claim that the proposed method can improve tasks like nuclei segmentation was not adequately evaluated, and the only quantitative metric used is similarity to real data.
    • Elaborate on the use of imperfect semantic segmentation masks for data generation.
    • Provide insights on the model’s tolerance for noisy/imperfect semantic masks.
    • Discuss the evaluation of the nuclei segmentation network on real and synthetic data and the potential differences.
    • Address the potential impact of the scale ablation on performance.




Author Feedback

R1 - Weak Reject (4) R2 - Accept (6) R3 - Accept (6)

We are grateful for the reviewers’ time and their positive and insightful reviews. We are encouraged that they find our work, well-motivated (R1), detailed (R2), clear (R2), novel (R2, R3), well supported with ablations (R2) and evaluations (R3), and highly-reproducible (R1, R2, R3). We address the reviewer comments below:

1 Lack of comparison on the same dataset (R1) Our quantitative evaluation aims to establish that NASDM is able to generate images indistinguishable from real images it is trained on. To the best of our knowledge, NASDM is a first-of-its-kind generative model for synthesizing patches conditioned on a nuclei semantic mask, making a fair comparison of its generative prowess difficult. We report FID metrics to measure the divergence between synthetic and real images and demonstrate that our method is able to generate a distribution closer to the one it was trained on compared to other methods. FID being a pure information metric, is comparable across datasets. However, we appreciate the reviewers’ concerns and will modify our quantitative analysis to include an FID metric for the best competing method on the same dataset as ours. We have computed these statistics already and the overall conclusions from the experiment have not changed. Adding it to the quantitative table would be straightforward.

2 Evaluation on downstream tasks (R1) In this work, we wish to establish that our method NASDM is able to generate medically realistic nuclei in the synthesized patches. As such, we use FID to demonstrate that synthesized samples are relatively very close to real patches. To validate the medical quality further, during the review phase, we finished conducting a pathologist review of the generated patches to specifically evaluate the overall quality of the patch, the medical quality of each type of nuclei, and the consistency of the generated patch with the given semantic mask. We will modify the qualitative section of our work to include the results from this expert review. This will give a better qualitative view of NASDM’s prowess of generating realistic nuclei artifacts. Additionally, we will try our best to add downstream evaluation on a quantitative nuclei segmentation task as recommended.

3 Ablation on Objective Magnification (R2) We thank the reviewer for pointing this out, we believe that the reviewer’s contention makes sense and will modify the experiment to reflect the suggestion. This change is fairly straightforward as the same experiment needs to be presented with an updated analysis.

4 Using synthetic/imperfect semantic masks to generate images (R2, R3) To demonstrate the generative prowess of the model, we use real semantic masks from a held out set to generate synthetic patches that closely resemble real-world patches. However, we did test the model on unrealistic and imperfect semantic masks for our own analysis and observed that the model is able to still generate medically relevant patterns. We understand and appreciate reviewers’ (R2, R3) comments and will include samples generated using imperfect and synthetic masks in the final version to give a holistic view of the model’s performance.

5 Synthetic images closely resemble real images (R2) We thank the reviewer for highlighting this. The generated synthetic images presented in the table resemble the real images as most images were put in the table based on a small MSE score with the real image. As such, the MSE metric inadvertently caused the chosen images to resemble the real images closely. However, if chosen randomly, the synthetic images do not resemble the real counterparts for most cases and the model is able to generate novel and realistic medical patterns in the patches. We agree with reviewer’s (R2) comments and will replace the synthetic images in the manuscript to better reflect this. We will also release a large-variety of generated patches with the code and trained weights.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers were convinced by the answer during the rebuttal phase. Pls. consider the remarks to have a valuable camera-ready version as a fruitful and impactful communication @ MICCAI.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This application paper presents a conditional diffusion model to synthesize images using semantic masks. The paper presents several examples from the public Lizard dataset with appropriate metrics showing clearly the usefulness of this method. Moreover, the rebuttal strengthens the paper even further by addressing all the outstanding concerns except the evaluation of this approach in the downstream nuclei segmentation task.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose to utilize diffusion model for synthetic pathology image generation with a help of nuclei masks. The results show that it can generate real-like synthetic images better than related works. Although the authors’ main focus is to generate real-like synthetic images, the concerns on the evaluations raised by the reviewers are valid. Hence, as the authors rephrase and revise the manuscript, this work merits the acceptance.



back to top