Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yiqing Shen, Jing Ke

Abstract

The commonly presented histology stain variation may moderately obstruct the diagnosis of human experts, but can considerably downgrade the reliability of deep learning models in various diagnostic tasks. Many stain style transfer methods have been proposed to eliminate the variance of stain styles across different medical institutions or even different batches. However, existing solutions are confined to Generative Adversarial Networks (GANs), AutoEncoders (AEs), or their variants, and often fell into the shortcomings of mode collapses or posterior mismatching issues. In this paper, we make the first attempt at a Diffusion Probabilistic Model to cope with the indispensable stain style transfer in histology image context, called \texttt{StainDiff}. Specifically, our diffusion framework enables learning from unpaired images by proposing a novel cycle-consistent constraint, whereas existing diffusion models are restricted to image generation or fully supervised pixel-to-pixel translation. Moreover, given the stochastic nature of \texttt{StainDiff} that multiple transferred results can be generated from one input histology image, we further boost and stabilize the performance by the proposal of a novel self-ensemble scheme. Our model can avoid the challenging issues in mainstream networks, such as the mode collapses in GANs or alignment between posterior distributions in AEs. In conclusion, \texttt{StainDiff} suffices to increase the stain style transfer quality, where the training is straightforward and the model is simplified for real-world clinical deployment.



Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_53

SharedIt: https://rdcu.be/dnwJ8

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a diffusion-based model for stain style transfer in histopathology images. It does so by adding a cycle consistency loss to the regular DDPM. The authors also propose to obtain the output image as an average of multiple stochastic runs of the diffusion process.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    It is one of the first works using DDPM models in histopathology. The method seems to outperform all compared baselines in both tasks used.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper fails to cite some relevant works. As en example SynDiff (https://github.com/icon-lab/SynDiff)seems to be based on the same exact idea of using DDPM along with cycle consistency for image synthesis. Another work that is not cited is StainCut (https://pubmed.ncbi.nlm.nih.gov/35877646/) where the same dataset is used and the reported performance seems to be better than the one presented in this work.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide clear descriptions on the hyper parameters used and the public datasets. They also engage to publish their code upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The work could improve in clarity. As an specific example, the work claims that one of the disadvantages of the GAN/AE models is that they have “challenging alignment of the posterior distributions”. This is repeated several times, but it is not clear what does this mean exactly. Missing citations such as StainCut and SynDiff seem to be very relevant.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work in its current form is missing some very important citations. SynDiff seems to be based on a very similar idea of using diffusion processes along with cycle consistency. StainCut is a relevant work (not cited here), based on the same dataset and the reported performance seems higher than the one in this work.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents a DDP based approach for stain (style transfer) in the histopathology images. The paper is on the lines of StainGan utilizing the cycle consistancy, albeit with diffusion based generative models instead of adversarial models. On a single public dataset, the authors have shown the superiority of the proposed method over the existing methods for stain transfer and performance on the downstream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem of stain transfer (normalization) is not new and a plethora of the literature exists to handle it with diverse methodologies. However, this paper proposes the methodology with a fresh perspective; introducing cycle consistency in the diffusion models. The method is akin to the styleGAN but introducing the cycle consistency in the DDP models is novel and non-trivial. Further, the approach seems to be effective as shown in the presented results. The method performs superior to the conventional stain normalization methods and GAN based approaches. Another proposed component ‘self-ensemble’ is crucial for the performance improvement (Table-1&2). Authors should have also mentioned that such self-ensembling is not possible with other approaches such as stainGan.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Authors have mentioned that the proposed methodology is easier to train as compared to the GAN based methods. However, the evidence in the literature are to the contrary. Diffusion models are unstable and difficult to train. Adding the cycle consistency should make it more complex.

    Another issue (not specific to this approach) but to the methods in this category (such as staingan) is the requirements of the datasets from both source and the target domain which may not also be available.

    Another issue may be low throughput as compared to staingan like methods due to more complex inference process.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Public dataset is utilized. Code will be made public upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Authors should include the limitation of the approach (please see ‘strength’) and also a comparison of the throughput can be included.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The recommendation is based on the novelty aspect of the methodology and its performance. This approach can potentially be new addition to the solution of such problems and can form a baseline for future research. Please check ‘strength’ for more details.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed diffusion probabilistic model (coined StainDiff) for stain style transfer problem in histopathology. As opposed to generative adversarial networks, autoencoders and their variants it does not suffer from the mode collapse problem or posterior mismatches issue. Thanks to novel cycle-consistent constraint StainDiff enables learning from unpaired images, while existing diffusion models require supervised pixel-to-pixel translation. StainDiff model can generate multiple transfer results from one input histology image. Furthermore, newly proposed self-ensemble scheme stabilizes performance. In overall, StainDiff is important for increasing reliability of deep learning models in various diagnostic tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Main strength of proposed method is that, unlike stain style transfer models based on generative adversarial networks and autoencoders, it does not suffer from the mode collapse problem or posterior mismatches issue. Capability of diffusion model to generate multiple slightly different outputs from one input image is used to improve and stabilize stain style transfer performance (self-ensemble scheme). It is of practical importance that proposed StainDiff model can be used for stain normalization purpose with very simple modification of the loss function by proclaiming the other domain (say B) as targeted stain style. As others learning based methods, proposed StainDiff method does not require manual (expert-based) selection of target image. Advantages of diffusion probabilistic models over GAN- and AE-based model in various image processing problems were known before. But their application to histopathology image style transfer was not explored. The main reason was that obtaining paired histology slides with different stain styles is not realistic to expect in clinical practice. That issue is circumvented in StainDiff by invention of cycle-consistent diffusion model. It allows unsupervised transfer of representations between latent spaces. In a way this model relies on success of Cycle-GAN and Style-GAN.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Main weakness of proposed StainDiff method is in its evaluation. The method is developed for multiple stain style transfers inspired by stain variations in multiple laboratories (institutions) experiments or even by stain variations over multiple experimental sessions within the same laboratory. The stain normalization version of StainDiff method is obtained as a special case when normalization with respect to the target image is required. That is, one of domains is considered as a target. However, experiments carried out on two datasets were both executed on two domains only. For the first dataset, two domains are two different scanners. For the second dataset, first domain contains multiple stain styles while the second domain is considered the target style. According to my judgment, the experiment carried out on the first dataset could be considered as stain normalization experiment with one style only, i.e. style transfer was in one direction only. What I was expecting would be to demonstrate stain style transfer at least in two directions in two domains (domain A to domain B, and domain B to domain A) and if possible in two directions in three domains (considering domain as laboratory).

    In Section 3 Implementations, it is said that Adam optimizer, which is gradient based, is used for model optimization. However, in eq. (3) the loss function is based on L1-norm that is not differentiable at zero, i.e. its gradient at zero is not defined. I am not sure whether authors are aware of that fact, and how that can affect reproducibility of the whole learning process?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The Python code is promised by authors to be available after the paper is published. Regarding clarity of presentation in Section Method, the readers are presumed to be familiar with many advanced concepts and, therefore, the whole learning process is presented with three general equations. I understand that character of the conference dictates that, but it is demanding for interested readers to gain deep understanding from such presentation. Experimental part of the work is described clearly.

    In Section 3 Implementations, it is said that Adam optimizer, which is gradient based, is used for model optimization. However, in eq. (3) the loss function is based on L1-norm that is not differentiable at zero, i.e. its gradient at zero is not defined. I am not sure whether authors are aware of that fact, and how that can affect reproducibility of the the whole learning process?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper proposed diffusion probabilistic model (coined StainDiff) for stain style transfer problem in histopathology. As opposed to generative adversarial networks, autoencoders and their variants it does not suffer from the mode collapse problem or posterior mismatches issue. Thanks to novel cycle-consistent constraint StainDiff enables learning from unpaired images, while existing diffusion models require supervised pixel-to-pixel translation. StainDiff model can generate multiple transfer results from one input histology image. Furthermore, newly proposed self-ensemble scheme stabilizes performance. In overall, StainDiff is important for increasing reliability of deep learning models in various diagnostic tasks.

    As others learning based methods, proposed StainDiff method does not require manual (expert-based) selection of target image. Advantages of diffusion probabilistic models over GAN- and AE-based model in various image processing problems were known before. But their application to histopathology image style transfer was not explored. The main reason was that obtaining paired histology slides with different stain styles is not realistic to expect in clinical practice. That issue is circumvented in StainDiff by invention of cycle-consistent diffusion model. It allows unsupervised transfer of representations between latent spaces. In a way this model relies on success of Cycle-GAN and Style-GAN.

    Main strength of proposed method is that, unlike stain style transfer models based on generative adversarial networks and autoencoders, it does not suffer from the mode collapse problem or posterior mismatches issue. Capability of diffusion model to generate multiple slightly different outputs from one input image is used to improve and stabilize stain style transfer performance (self-ensemble scheme). It is of practical importance that proposed StainDiff model can be used for stain normalization purpose with very simple modification of the loss function by proclaiming the other domain (say B) as targeted stain style.

    Proposed StainDiff is evaluated on two benchmark datasets and compared with GANs and AEs based stain style transfer methods. On the first dataset MYTOS-ATYPIA 14 Challenge there are two stain styles obtained by using two different scanners for slide digitization. On stain style transfer experiment proposed StainDiff method outperformed several methods (including GAN- and AE-based) in terms of three image quality metrics in statistically significant manner. In particular, self-ensemble version of the StainDiff method yielded statistically significantly improved results over StainDiff without ensemble. On the second dataset StainDiff is used in stain normalization mode and its quality is verified by classification accuracy of five deep networks when applied to stain normalized images generated by StainDiff and nine competitors. StainDiff in self-ensemble mode again demonstrated superior performance.

    The method is developed for multiple stain style transfers inspired by stain variations in multiple laboratories (institutions) experiments or even by stain variations over multiple experimental sessions within the same laboratory. The stain normalization version of StainDiff method is obtained as a special case when normalization with respect to the target image is required. That is, one of domains is considered as a target. However, experiments carried out on two datasets were both executed on two domains only. For the first dataset two domains are two different scanners. For the second dataset first domain contains multiple stain styles while the second domain is considered the target style. According to my judgment, the experiment carried out on the first dataset could be considered as stain normalization experiment with one style only, i.e. style transfer was in one direction only. What I was expecting would be to demonstrate stain style transfer at least in two directions in two domains (domain A to domain B, and domain B to domain A) and if possible in two directions in three domains (considering domain as laboratory).

    In eq.(2) the “~” sign is missing above symbols on the left side.

    In Section 3 Implementations, it is said that Adam optimizer, which is gradient based, is used for model optimization. However, in eq. (3) the loss function is based on L1-norm that is not differentiable at zero, i.e. its gradient at zero is not defined. I am not sure whether authors are aware of that fact, and how that can affect reproducibility of the whole learning process?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Main strength of proposed method is that, unlike stain style transfer models based on generative adversarial networks and autoencoders, it does not suffer from the mode collapse problem or posterior mismatches issue. Capability of diffusion model to generate multiple slightly different outputs from one input image is used to improve and stabilize stain style transfer performance (self-ensemble scheme). It is of practical importance that proposed StainDiff model can be used for stain normalization purpose with very simple modification of the loss function by proclaiming the other domain (say B) as targeted stain style.

    The method is developed for multiple stain style transfers inspired by stain variations in multiple laboratories (institutions) experiments. However, experiments carried out on two datasets were both executed on two domains only.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    In my review I pointed out to two potential issues: (i) The method is developed for multiple stain style transfers inspired by stain variations in multiple laboratories (institutions) experiments or even by stain variations over multiple experimental sessions within the same laboratory. The stain normalization version of StainDiff method is obtained as a special case when normalization with respect to the target image is required. That is, one of domains is considered as a target. However, experiments carried out on two datasets were both executed on two domains only. For the first dataset two domains are two different scanners. For the second dataset first domain contains multiple stain styles while the second domain is considered the target style. According to my judgment, the experiment carried out on the first dataset could be considered as stain normalization experiment with one style only, i.e. style transfer was in one direction only. What I was expecting would be to demonstrate stain style transfer at least in two directions in two domains (domain A to domain B, and domain B to domain A) and if possible in two directions in three domains (considering domain as laboratory).; (ii) In Section 3 Implementations, it is said that Adam optimizer, which is gradient based, is used for model optimization. However, in eq. (3) the loss function is based on L1-norm that is not differentiable at zero, i.e. its gradient at zero is not defined. I am not sure whether authors are aware of that fact and how that can affect the whole learning process?

    The authors provided convincing answer on my first comment. I would advise them to include it in the paper. The authors ignored my second comment. Nevertheless, I do not think it has a catastrophic impact. Nevertheless, due to that I retain my original opinion: accept - good paper with moderate weakness.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a diffusion-based model for stain style transfer in histopathology images by adding a cycle consistency loss to the regular DDPM. Reviewers have different opinions upon paper acceptance. In the rebuttal, authors need to address their concerns in 1) citation and relation to SynDiff and StainCut (R1); 2) limitation of the method (training stability and interference time, R2); 3) bidirectional stain transfer (R3, authors do not need to add new experimental results but need at least discussion).




Author Feedback

  1. Citation and relation to SynDiff and StainCut (R1) We believe major concern of R1 is that StainCUT[1] and SynDiff[2] have not be discussed or cited in our paper. However, we have cited and compared with many state-of-the-arts and competent methods, and will also add these 2 in our paper. Technically, StainCUT[1] utilizes the contrastive learning in image-to-image translation hence time-consuming to train. StainCUT is comparable with CL-StainGAN, and the latter was compared in Tab.1. It is noteworthy that our proposed StainDiff also surpasses the StainCUT, achieving the SSIM of 0.721 and FSIM of 0.753 (vs StainCUT’s SSIM of 0.695 and FSIM of 0.633). SynDiff[2] is an adversarial diffusion model developed specifically for multiparametric MRI/MRI-CT translation, which cannot be applied directly to histology stain style transfer for its failure in anatomy or tissue structure preservation. Moreover, additional guidance is required in SynDiff’s translation in the image space. To address this limitation, we propose an innovative paired diffusion model architecture, where the cycle-consistency constraint is imposed on the latent space to improve flexibility along with a self-ensemble scheme, which can significantly improve the stain normalization stability. On this count, our method is superior to VAE in the alleviation of challenging posterior probabilities, and GANs in the free of training additional discriminators The medical significance of our solution lies in the development of a universal, easy-to-train, and more capable diffusion model for stain transfer and normalization. Importantly, this approach addresses a gap in previous works, as the same formulation has not been adequately addressed before. [1] “StainCUT: Stain Normalization with Contrastive Learning.” Journal of Imaging (2022). [2] “Unsupervised medical image translation with adversarial diffusion models.”arXiv:2207.08208 (2022).

  2. Limitation (R2)
    • Stability: In the context of color normalization, stability refers to the uniformity of the output stain style from stain-heterogenous input rather than the one in training process. The stability is mapped to the additional benefit of the self-ensemble scheme in our StainDiff (as indicated in Tab. 1/2). More importantly, the advantage of diffusion models to GAN is their alleviation in mode collapse issues, as highlighted in many previous works; and this solution is of great diagnostic importance in pathology images to prevent context information loss, as demonstrated in Tab. 1 [ref. 3, 9, 24].
    • Inference time: Non-deep-learning methods, such as Reinhard, Macenko, Khan, or Vahadane, are superior in fast processing, e.g., a maximum of 0.4 seconds to process an image of 256*256. By contrast, GAN-based methods, such as StainGAN and CL-StainGAN, require 1.0 to 1.1 seconds. Although our method currently takes as long as 32.1 seconds, we can utilize the efficient sampling technique DDIM to achieve a 20x acceleration without any performance drop. The resulted inference time of 1.6 seconds is comparable to GAN-based methods. Additionally, the replacement of conventional convolution with depth-width separable convolution, or self-attention with FlashAttention, can further reduce the inference time. This speedup is in our future research and will not hinder the applicability of the proposed StainDiff method in real clinical practice, as will be added in our Conclusion.
  3. Bidirectional Stain Transfer (R3) The proposed StainDiff method can inherently transfer stain styles in a bidirectional manner. Although only the results of transferring from domain A to domain B was presented in the manuscript, it is important to note that the overall trend is consistent when transferred from domain B to A. That’s to say, our method is capable to consistently outperform both GANs and conventional methods in either direction transfer.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a diffusion-based model for stain style transfer in histopathology images by adding a cycle consistency loss to the regular DDPM. Authors provide sufficient comparison to existing stain style transfer methods including classical non-deep learning based ones and GAN-based ones and demonstrated that diffusion model is superior to those in terms of similarity measures of generated images and a down-stream classification task. Yet as R3 pointed out, the proposed method is evaluated in a dual-domain style transfer (and authors explain in the rebuttal that it can be bidrectional). It is not straightforward how it generalized into a multi-domain setting, which is more practical in real clinical settings. For the other major concern regarding speed and stability of training, authors provide sufficient answers in the rebuttal in my opinion (currently slow, but possible to speed up to similar speed as GAN method). In general, I feel the paper can be accepted given it is one of the first attempts that applies diffusion model into histopathology and methodology/experiments are also solid.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a method for DDPM-based stain transfer, which is further stabilized by self-ensembling. The general idea of combining DDPMs with an additional consistency term is appreciated by all reviewers, and in the experiments, the method achieves improved results compared to other methods. Weaknesses include some limitations in the related work selection and as well as limitations with regard to training stability and inference time. In their rebuttal, the authors responded appropriately to the mentioned weaknesses, and report higher performance than StainCut, which was the main point mentioned by the most critical reviewer (R3).

    Taken together, the paper presents an alternative, though still somewhat early-stage approach (run-time wise) for style transfer, which I deem generally suitable for presentation at MICCAI.

    Minor: I am not sure what kind of GAN architectures the authors have used, but inference times of >1sec. for a 256x256 for a feed-forward network seem to be fairly high.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript presents a diffusion model-based method for stain style transfer in histology images. Compared with other diffusion model-based approaches for image synthesis or generation, it uses a cycle-consistent constraint to allow the model to be trained with unpaired images from different domains. The method produces better stain style transfer performance than some other competitors including those based on generative adversarial networks (GANs). The rebuttal has addressed the main concerns from reviewers, such as relationship with (or difference from) some other work like SynDiff and StainCut, limitation of the method, and bidirectional stain transfer. Therefore, an acceptance is recommended.



back to top