Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhenqi He, Junjun He, Jin Ye, Yiqing Shen

Abstract

Histological whole slide images (WSIs) can be usually compromised by artifacts, such as tissue folding and bubbles, which will increase the examination difficulty for both pathologists and Computer-Aided Diagnosis (CAD) systems. Existing approaches to restoring artifact images are confined to Generative Adversarial Networks (GANs), where the restoration process is formulated as an image-to-image transfer. Those methods are prone to suffer from mode collapse and unexpected mistransfer in the stain style, leading to unsatisfied and unrealistic restored images. Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion. Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity. Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. Our extensive evaluations demonstrate the effectiveness of ArtiFusion as a pre-processing method for histology analysis, which can successfully preserve the tissue structures and stain style in artifact-free regions during the restoration. Code is available at https://github.com/zhenqi-he/ArtiFusion.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_50

SharedIt: https://rdcu.be/dnwJ5

Link to the code repository

https://github.com/zhenqi-he/ArtiFusion

Link to the dataset(s)

https://camelyon17.grand-challenge.org

https://github.com/lu-yizhou/ClusterSeg


Reviews

Review #1

  • Please describe the contribution of the paper

    The author focused on artifact restoration in histopathological images. The method was based on the denoising diffusion model. The experiment demonstrated the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The motivation of the paper is very important in this field. This is the first paper which proposed the artifact restoration based on the denoising diffusion model. The proposed method does not require the artifact images. The experiment demonstrated the effectiveness of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The explanations about the proposed method and the experiment is not sufficient. The detail of this comment appear in No. 9. The difference between the proposed method and DDPM is not clear.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I recommend the author to add the detailed explanation of the code in README.md.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In Fig. 1, the border between (a) and (b) is not clear. The difference between the proposed method and DDPM from the aspect of method is not clear. In the proposed method, I’m not sure how ‘m’ can be learned. In the experiment, the comparison methods are not clear. Unet + time scheme -> ‘U-Net’? Swin-Transformer + directsummation -> ‘Add’? If there is ArtiFusion without time as a comparison method in the experiment, it would be more persuasive. In Table 2, does the values in the rows of ArtiFusion (U-Net) and ArtiFusion (Add) incorrectly swapped? In the evaluations by downstream classification task, the method of ‘Artifact’ is not clear.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please answer my comments above.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The rebuttal alomost satisfied me. I raised the rate.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a diffusion model-based algorithm for artifact restoration in histology image. To capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. ArtiFusion is stated as the first attempt at a diffusion-based artifact restoration framework for histology images.
    2. The training of ArtiFusion relies solely on artifact-free images, which simplifies the training complexity.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of comparisons: the paper only compares the proposed method with a CycleGAN-based baseline. More comparisons would be desired.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Codes are provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I am wondering the practical significance of the artifact restoration. It is understandable this prevents the bias caused by artifacts when training a pathology image classification model. However, from the view of analyzing more fine-grained tasks, e.g., nuclear segmentation, are the generated nuclei really of practical importance? Are they really the nuclei covered by artifacts?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experiments are not comprehensive enough.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    remain unchanged



Review #3

  • Please describe the contribution of the paper

    The proposed method in this paper, ArtiFusion, differs from existing methods in that it formulates the artifact region restoration as a gradual denoising process and relies solely on artifact-free images for training. Additionally, it uses a novel Swin-Transformer denoising architecture and time token scheme to capture local-global correlations in the regional artifact restoration. Existing methods are confined to Generative Adversarial Networks (GANs), which can suffer from mode collapse and unexpected mis-transfer in the stain style, leading to unsatisfied and unrealistic restored images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The advantages of ArtiFusion include its ability to successfully preserve tissue structures and stain style in artifact-free regions during restoration, as demonstrated by extensive evaluations. It also requires only half the training set size of CycleGAN, another prevalent method used for comparison. However, one potential disadvantage is that there are currently limited available literature works and open-sourced codes for comparison with ArtiFusion specifically in the histology domain.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses:

    • Limited direct comparisons to other methods that use self-supervised learning or other approaches
    • Limited information on computational efficiency
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The diffusion method is slow due to the large number of samplings. However, the paper does not provide a direct comparison of ArtiFusion’s speed to existing methods. However, the authors do mention that the proposed method is computationally expensive due to the large number of samplings required by the diffusion process.

    To address this issue, the authors propose a time token scheme that reduces the number of samplings required during training and inference. Additionally, they use a Swin-Transformer denoising architecture that is designed to capture local-global correlations in regional artifact restoration more efficiently than other architectures.

    While there is no direct comparison of ArtiFusion’s speed to existing methods, the authors do report that their proposed method achieves superior results compared to state-of-the-art GAN-based methods on a public histological dataset. Therefore, while ArtiFusion may be computationally expensive due to its use of diffusion-based artifact restoration, it is still able to achieve promising results in terms of accuracy and preservation of tissue structures and stain style in artifact-free regions during restoration.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Provide more direct comparisons to existing methods: While the paper does demonstrate the superiority of ArtiFusion over state-of-the-art GAN-based methods, it would be helpful to have more direct comparisons to other methods that use self-supervised learning or other approaches.

    2. Provide more information on computational efficiency: The paper mentions that ArtiFusion is computationally expensive due to the large number of samplings required by the diffusion process. However, it would be helpful to have more information on how long it takes to train and test the model and how it compares to existing methods in terms of speed.

    3. Provide more details on dataset and evaluation metrics: The paper briefly mentions that experimental results were obtained on a public histological dataset, but it would be helpful to have more information on the dataset used and how it was preprocessed. Additionally, while the paper reports results using several evaluation metrics, it would be helpful to have more information on why these metrics were chosen and how they relate to clinical relevance.

    Overall, while ArtiFusion is an innovative approach for artifact restoration in histology images, providing more direct comparisons to existing methods, more information on computational efficiency, and more details on dataset and evaluation metrics could help strengthen the paper’s contributions.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel method called ArtiFusion for artifact restoration in histology images. The proposed method uses a diffusion-based approach to restore regional artifacts while preserving tissue structures and stain style in artifact-free regions. The authors also introduce a Swin-Transformer denoising architecture and time token scheme to capture local-global correlations in regional artifact restoration. Experimental results on a public histological dataset demonstrate the superiority of ArtiFusion over state-of-the-art GAN-based methods.

    Recently, diffusion-based methods have become more popular in the generation space due to resolution issues. This paper is a new idea in this context, and I think it will be useful to the readers of the conference.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposed a new method to remove artifact and restore clear image for histology images with a diffusion model with swin-transformer architecture. It shows better reconstruction quality than cycleGAN and require only half of images for training. All reviewers acknowledge i) the paper innovation being the first paper which proposed the artifact restoration based on the denoising diffusion model, ii) the benefits of not requiring the artifact images for training, iii) the effectiveness of the method to successfully preserve tissue structures. In the rebuttal, authors need to address their main concerns in 1) terms and potential ambiguities in Table 2 (R1) 2) Computational efficacy (R1 & R3) 3) lack of comparison with other methods (R2&R3). For example, since authors have image pairs (artifact-free images and simulated artifact images), is pix-pix GAN a more direct comparison than CycleGAN for unpaired images?




Author Feedback

  1. Ambiguity Clarification(R1)
    • Fig. 1: Boarder is between (a) and (b) is between ‘Backbone Network’ and ‘Denoising’.
    • Fig. 2: m represents a Boolean mask which indicates the artifact region as defined in Eq.2. It is generated by threshold method rather than a learnable parameter.
    • Tab. 1&2: ‘U-Net’ signifies UNet with Direct Summation (time scheme is proposed specifically for transformer-based network, thus is not applicable for UNet); ‘Add’ equates to Swin-Transformer with Direct Summation.
    • Tab.2 is correct no swap between ArtiFusion U-Net and Add.
    • ‘Artifact’ refers to our method with full settings.
  2. Compare to ArtiFusion w/o Time Scheme (R1): The time embedding is indispensable for diffusion models, where the direct summation utilized in conventional DDPM is already compared in Tab. 2 (denoted ‘Add’). This scheme demonstrates a performance reduction with Swin-Transformer, compared to the proposed time scheme, by 0.67 SRE and 0.29 PSNR.

  3. Difference with DDPM (R1): 1) Swin-Transformer-based denoising network to replace UNet in DDPM to capture local-global correlations in restoration; 2) Introduce a novel time concatenation scheme to fuse time information into hidden features; 3) Restrict the denoising region to artifact areas and maintaining the stain style of artifact-free regions.

  4. Compared Method (R1): We compared our method with three other settings: CycleGAN, UNet, and Swin-Transformer-based DDPM with direct time summation (the latter two are ablations, as they are part of the proposed method).

  5. Additional Comparisons (All): Pix2pix GAN and its variants rely on a paired dataset of arti-fact-free and artifact images for training, where acquiring such paired datasets from real world is not feasible. On the contrary, our proposed method has the significant advantage of being trainable solely on non-artifact images, which eases the task of dataset curation and enhance the practical feasibility and applicability. The formulation of our task involves unique complexities that are challenging to address without the robust capabilities of the diffusion model. Consequently, few previous studies have explored this direction, limiting the available works we could compare our meth-od with. Despite these limitations, we did evaluate our approach against several CycleGAN variants. However, we found that these variants did not provide any substantial improvement over CycleGAN in terms of performance (eg CycleGAN: 0.8184, AttentionGAN: 0.8173, UGATIT: 0.8190, DualGAN: 0.8003, ArtiFusion 0.8216 MSE).

  6. Computational Efficacy (R1/3): ArtiFusion (Full Setting) can restore images faster than the UNet-based denoising network, reducing the restoration time from 112.37s to 30.71s. Compared with CycleGAN, although our approach requires more training/inference time, it outperforms by a significant margin in restoration performance [avg. training time for CycleGAN, ArtiFusion (UNet, Add, Full setting) are 9.4, 41.2, 16.1, 19.7 mins]. Furthermore, adopting DDIM at the inference stage enables us to achieve a 20X acceleration (avg. inference time 0.99s by ours+DDIM vs. 1.10s by CycleGAN), slighter faster than CycleGAN without compromising performance.

  7. Practical Significance (R2): Digital histology slides are scanned at high resolution, hence huge proportion of tasks are detection/classification. Artifacts can lead to misdiagnosis by AI models, compromising the precision of automated analysis. Existing solutions simply discard artifact regions, which can potentially lead to loss of valuable information. ArtiFusion can narrow the gap as preprocessing. In fine-grained tasks eg segmentation, generated nuclei might not perfectly replicate every detail, but the inherent randomness in DDPM introduces the ability to generate multiple outputs from a single image, providing multiple potential estimations and a measure of uncertainty for more nuanced analysis, thereby can offer more comprehensive and reliable results.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposed a new method Artifusion to remove artifact and restore clear image for histology images with a diffusion model with swin-transformer architecture. It has demonstrated that compared to normal U-net, swin transformer better captures the local-global context in the restoration task. In the rebuttal, authors has clarify a few important technical details including method comparison and inference time/speed up. I feel the rebuttal has strengthern the paper, also indicated by one reviewer score rise (R1 from 4 to 5). Although it need sustantial more amount of work to demonstrate whether a method can be used to address image artefact in histology, the current work is still a nice attempt with some promising results and worth to share in the conference (the code is also available) I recommend to accept the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work presents a new diffusion based method for artifact restoration. All the reviewers acknowledge the novelty of the approach for this specific task. The main problem I have with this approach is the practical significance and for this Figure 4 is telling. In the relatively cleaner artifact last-row image where the nuclei are visible (underneath the ink artifact), the ArtiFusion method completely removed these visible nuclei in the final result (last column) whereas the CycleGAN approach restored these. I am wondering what is the use case for this kind of approach (aesthetically pleasing restoration or actual restoration in relatively cleaner images?). This to me is really problematic and without any merit. Methods should be presented to approach the artifact ridden cases where there is some clinically relevant information present that can be verified by the end user. If the clinically relevant information underneath is completely occluded or masked, then we can only hallucinate the signal which is problematic. This leads me to the reject recommendation.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors present a DDPM-based approach to inpaint artifact regions in histopathology. The general idea and the approach itself is appreciated by the reviewers in the initial review, including it being the first DDPM on this task, the need to not acquire paired images, and the quality of the generated images. Main weaknesses mentioned by the reviewers include the practical significance of the approach, a limited number of comparisons, and questions with regards to the computational efficacy.

    Unfortunately, the authors did not really answer the question by the M#1 in their rebuttal, why they didn’t use a Pix2Pix network given that they have the possibility to generate paired images by adding simulated artifacts (as they do for the evaluation) but disregarded this possibility. One additional aspect that was not mentioned is that the approach currently assumes the artifactual region to be delineated (segmented), which is not mentioned/discussed in the paper. The runtimes and training times mentioned in the rebuttal are also not fully clear.

    In its current form, the paper presents an inpainting diffusion model with small adaptations that have small effects on the inpainting performance, without a corresponding review of DDPMs for inpainting. Taken together, from my perspective, the paper therefore is in its current form below the acceptance threshold for the MICCAI main conference.



back to top