Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Walter H. L. Pinaya, Mark S. Graham, Robert Gray, Pedro F. da Costa, Petru-Daniel Tudosiu, Paul Wright, Yee H. Mah, Andrew D. MacKinnon, James T. Teo, Rolf Jager, David Werring, Geraint Rees, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso

Abstract

Deep generative models have emerged as promising tools for detecting arbitrary anomalies in data, dispensing with the necessity for manual labelling. Recently, autoregressive transformers have achieved state-of-the-art performance for anomaly detection in medical imaging. Nonetheless, these models still have some intrinsic weaknesses, such as requiring images to be modelled as 1D sequences, the accumulation of errors during the sampling process, and the significant inference times associated with transformers. Denoising diffusion probabilistic models are a class of non-autoregressive generative models recently shown to produce excellent samples in computer vision (surpassing Generative Adversarial Networks), and to achieve log-likelihoods that are competitive with transformers while having fast inference times. Diffusion models can be applied to the latent representations learnt by autoencoders, making them easily scalable and great candidates for application to high dimensional data, such as medical images. Here, we propose a method based on diffusion models to detect and segment anomalies in brain imaging. By training the models on healthy data and then exploring its diffusion and reverse steps across its Markov chain, we can identify anomalous areas in the latent space and hence identify anomalies in the pixel space. Our diffusion models achieve competitive performance compared with autoregressive approaches across a series of experiments with 2D CT and MRI data involving synthetic and real pathological lesions with much reduced inference times, making their usage clinically viable.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_67

SharedIt: https://rdcu.be/cVVqo

Link to the code repository

N/A

Link to the dataset(s)

https://www.ukbiobank.ac.uk/

https://www.med.upenn.edu/sbia/brats2018/data.html


Reviews

Review #1

  • Please describe the contribution of the paper

    This papers introduce latent DDPM (Denoising Diffusion Probabilistic Model) into medical image analysis and proposes a new method for unsupervised brain anomaly detection and segmentation. DDPM as a new generative model has the potential to model the data distribution with high image quality. Based on the observation that if the input image is from a healthy subject, the reverse process will only remove the added Gaussian noise, while if the image contains an anomaly, the reverse process removes part of the signal of the original anomalous regions, the authors proposed to compute a mask from the sampling, thus detecting and performing segmentation from brain imaging in an unsupervised fashion.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The method using latent DDPM for unsupervised medical image analysis is new as there are limited works in this field.
    2. The proposed method is simple and seems reasonable.
    3. The proposed methods are evaluated on four public MRI and CT datasets. The experiment results show an improvement for some cases in an efficient way.
    4. The organization of the paper is also good.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My main concern is the effectiveness of the proposed method. Basically, this architecture is based on latent DDPM [16, 19], and the unsupervised strategy is based on an observation. At least there should be some theoretical supports or visualizations for this observation.

    The performance of the unsupervised segmentation is worse than previous methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not mentioned

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. One of the main disadvantage of DDPM is its slow sampling speed. But the authors claimed in abstract that DDPM can achieve log-likelihoods that are competitive with transformers while having fast inference times.

    2. I really donnot undertand why a transformer[16] need about 10 mins for the inference in one 2d image. If so, the training process will be extremely crazy. Can you give me more explanations about why?

    3. More convincing Visualizations for both the observation (as mentioned before) and segmentation. At least, from the supplementary material I get limited information from the visualization. What do you want to emphasize in Fig. 1?

    4. For BRATS, there are three tumor labels, which one do the authors use? which region of the reported DICE score is for?

    5. Minor: What is the ‘f’ in tables 1 and 2? The m in page 5 is referred before definition.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As mentioned before

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposed a novel unsupervised pipeline, based on vector quantized variational autoencoder (VQ-VAE) and denoising diffusion probabilistic models (DDPM), for brain anomaly detection and image healing. It achieves comparable performance with state-of-the-art algorithms while having the advantage of fast inference time, which may increase its value in clinical applications.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed pipeline is methodologically sound and very suited to the application. It is interesting that the introduction of DDPM significantly improves the performance of VAE-based methods.
    • The authors performed evaluations on three different synthetic and real datasets, which is comprehensive. The detailed supplementary material and the nice video add to the thoroughness of the paper.
    • The evaluation of the inference time demonstrates its clinical feasibility.
    • This paper is well-written and well-organized.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lacks statistical analysis to demonstrate whether the improvements are significant. Also, it would be better if the authors can report the standard deviation of the Dice scores.
    • To better understand the upper bound, it would be valuable if the authors could report the segmentation accuracy of supervised methods (maybe U-Net) in each experiment.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    None

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In addition to the above major comments, please find below some minor suggestions:

    • The authors mention that the proposed approach may be useful in 3D neuroimaging applications in the introduction, but all the experiments are done in 2D. It may be helpful to provide a discussion about this at the end of the paper.
    • Although a video is provided in the supplementary material, it would still be helpful to the readers to show some example images together with their segmentations in the main manuscript.
    • Small typo: “using at diffusion model” in the introduction.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This well-written and well-organized paper is methodologically sound, and the evaluation is comprehensive. I believe it will be interested to the community.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    I think the topic is interesting and the method is novel. If the authors can revise the manuscript as promised in the rebuttal and also add the experiment on the upper bound in the supplementary, I believe this paper will introduce valuable discussion at the conference.



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel method combining a variational autoencoder with a codebook to encode images into a latent space with a denoising diffusion model to “heal” pathological images. This unsupervised model is then used to restore anomalies without the need for manual annotations and then use the residual between the restored and original image to segment anomalies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a clever solution to segment anomalies in an unsupervised manner combining two state-of-the-art techniques for image synthesis. The idea of compressing the original image into a latent space and then correcting that representation by making it closer to the previously learned healthy distribution is an interesting one.

    The model is tested with different public datasets with different types of anomalies (lesions) ranging from small lesions to tumors. Furthermore, execution times for a time critical approach are also provide to further validate the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper is in general well-written (especially the detailed introduction), the writing starts to fall apart when it comes to the methodology. The abuse of notation only obfuscates the point the authors are trying to make. For example: why say that training samples follow the distribution of the training dataset with notation x0~q(x0)? This fact never comes again and seems superfluous. The same goes for most of the formulation for VQ-VAE and DDPM. There is a contrast between the way each concept and definition is presented mathematically in the original papers and the rough summary presented here for the same ideas. In fact, most equations are mostly taken as is with slight modifications (and some typos). A short summary of the most relevant ideas and a link to the original papers for the interested reader would have made the methodology clearer and easier to read.

    Why is there a need to combine two different synthesis methods for anomaly detection. Why is VQ-VAE not enough? Why are diffusion models applied on the latent vectors from VQ-VAE necessary? While I can see from the results that the combination is better than VQ-VAE alone I would have preferred a better intuition for the need of combining both. What happens if the diffusion model is applied on the original data and not the latent representation? Why is the upsampled mask from the latent space so important for the final result? What happens if the mask is used as is? Hypothesis or discussions on these questions would strengthen the paper.

    The results on the synthetic dataset are misleading. First of all, a dataset of 64x64 images seems like a less than ideal benchmark for segmentation where image detail is one of the most important things. Furthermore, the way the experiment is setup is unrealistic when compared to lesions in pathological brains (by randomly masking pixels). As a consequence, the impressive results obtained in Table 1 are far lower than those presented on real imaging datasets (Tables 2 and 3). In fact, the ensemble model obtains the highest results in Table 2. While this is compensated by the time comparison in Table 3, for segmentation purposes the results seem fairly low for all the methods (especially for small lesions).

    Finally, the metrics are either not defined or their definition is relegated to the captions. What does “theoretically best possible Dice score mean”? How is that upper bound calculated? Why is AUPRC never defined (I assume it means area under the probabilistic ROC curve)? Furthermore, some other concepts are also poorly defined. I understand that the original image has HxW dimensions, while the latent representation has hxw, which implies a smaller size. However, this is never clearly stated. While it is implied by the fact that masks obtained on the latent space are upsampled at the end, clearly stating the relationship between the dimensions would help.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    While the code is not publicly available (and the reproducibility checklist says so), I believe the authors might have used the original repositories for VQ-VAE and DDPM which are publicly available. Even though implementation details are not given, it might be possible to reproduce the method partially. Furthermore, the results are presented using public datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    One thing that I think could paint the results in a better light, would be to present detection results instead of segmentation. In critical cases where detecting possible anomalies in time is more important than getting a good delineation, a good detection metric paired with a small execution time would be required. For example, a set of anomalous and normal images could be gathered and then the detection rate of anomalous regions could be used instead of Dice. That would help recontextualize Tables 2 and 3.

    Another aspect that I think could be improved is the methodology section. There is an abuse of math notation that is mostly lifted from the original papers but removing some key details and definitions. As a consequence, the summarized explanations are hard to follow and confusing. I would have preferred a high level explanation (which is partially given) with a limited use of notation complemented by the reference to the original papers for the interested readers. One example of this is the explanation of Lcodebook as “We used the exponential moving average updates for the codebook loss”. Without checking the original paper and only reading the explanations on the manuscript it is impossible to understand what it is referring to.

    Finally, this is more of a personal opinion, but I think that it would be interesting to use classical unsupervised segmentation approaches to compare where possible. They are simpler, before the deep learning revolution they were extensively used, they are fast (especially if implemented in GPU) and they can also give good results. That would further help contextualize the idea of unsupervised lesion segmentation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While I think the idea is interesting and it is great to see new unsupervised applications for deep neural networks that heavily rely on supervised training, there are some concerns that make me hesitate about the paper. The methodology seems to be unnecessarily obfuscated. I think it would have been much more valuable to give a general intuition about why these two methods need to be combined. Finally, the contrast between the synthetic results and real ones makes me feel weary about the contribution. According to the results, it is really hard to say that the proposal can actually address segmentation in real scenarios.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    After the rebuttal, the authors have addressed one of the main issues I had with the original submission about the methodology. I know understand that VAE is only used for compression instead of trying to estimate the data distribution. I am still not entirely certain about the details (and I plan on reading the references again), but I better understand the point of the authors. As a consequence I am slightly raising my score.

    However, looking at the results that will mostly remain the same, I am still unconvinced about the need for what is essentially a 3-step process (VAE, DDPM and the final segmentation). I still believe there is a lack of focus on detection (which is part of the tile) and that the segmentation results on real data are still subpar. Nonetheless, the other contributions are strong enough to deserve (weak) acceptance as a promising initial idea into the topic.

    As a final note, I strongly encourage the authors to further explore some of the remaining unanswered questions in a future extended paper (especially detection).




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received a mixed review of positives and negatives. All reviewers agreed that the idea of introducing a latent denoising diffusion probabilistic model for brain anomaly detection and segmentation is new and interesting. However, several concerns of this paper arise from different aspects (more details can be found in reviewers’ feedback). The authors are highly encouraged to respond to all reviewers’ questions and concerns in the rebuttal letter, particularly focusing on those from R3.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

REVR#1 Sec 6.1. Thank you for highlighting this point. Indeed, DDPMs have a slower sampling speed than GANs and VAEs, but they are faster than autoregressive transformers. To be precise, we will change it to “…while having relatively fast inference times…” . Sec 6.2. During the training, the transformers can learn to predict the probability of the next token for the whole inputted sequence simultaneously. This parallelization is one of the key advantages of transformers compared to previous methods used to model sequences (e.g., LSTMs). However, in the method proposed in [16], it is necessary to sample new values for tokens with low probability, where this sampling process needs to be made sequentially for autoregressive models (i.e., first we need to sample and replace all previous low probabilities tokens before use this partial healed sequence as input in order to sample the value for the next low probability token). This can result in several inference steps, which make the method slow. Sec 6.3. Thank you for the comment. As also requested by other reviewers, we will improve the visualization of the results by adding a figure showing inputted image and the predicted segmentation. We plan to make space for it by revising Sec. 2 as requested by REVR#3. In Fig.1, we show the overview of the models involved in the method and the main processes in the DDPM. Sec 6.4. We used all labels from intra-tumoral structures (i.e., edema and core parts) as the anomaly mask to obtain the DICE score.

REVR#2 Thank you for your suggestions, especially regarding adding a supervised method as an upper bound. We agree that it would be interesting to check how close our method is to the supervised ones. It will not be possible to add this analysis without changing our article substantially, but it would be great to include it in a future extension of this study. As requested by you and REVR#1, we will introduce a figure showing the segmentation in the revised version.

REVR#3 Sec 3 par 1. Thank you for your comments. As a methodological paper, we were worried about presenting most of the details about the proposed method and related models without depending on the reading of previous papers. We agree that using only a summary could make the method section easier to read. We will address this point in the revised version. Sec 3. par. 2. 1) We believe there was a misunderstanding in the first questions. As pointed out in Sec 2.1, based on previous studies [16, 19], we are using the VQVAE only as a compression model (where we do not include any generative part as proposed in [16, 19]). Only the diffusion model is used as the generative model that learns the data distribution (as proposed in [19]). We will make it clearer in the revised version. 2) We used the latent representation to make our method more scalable. This is a crucial feature to make our method be possibly applied to images with higher resolutions or 3D data (as said in the Introduction). 3) Similar to [16], our VQVAE models can create blurry areas in the reconstructions, which results in high values in the residual maps (generating false positives). To mitigate the VQVAE fidelity limitation, we use the mask created by the diffusion model to filter the residual map (similar to [16]). 4) We observed that the Gaussian filter helps with the blocky aspect of the mask (due to its creation in a lower dimensional space), resulting in a better anomaly segmentation performance. 5) Thank you for your suggestions. Due to page limit, we could not include these discussions, but most of them are already discussed in similar methods [16], whereas, in our study, we focus on novel aspects, like inference time. Sec 3. par. 4. 1) Thanks for pointing that out. We will add the metrics definition in the revised version. 2) We tried to highlight the size of the latent space in the last sentence of Sec 2.1 and by using the term “compression model”. We will highlight it further in the revised version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers recommended (weak) accept after going through the authors’ rebuttal. While it seems the authors have addressed most of the major concerns of all reviewers, questions on the necessity of the complicated network architecture and a thorough evaluation on the method still remain. The authors should carefully revise the current manuscript to address reviewers’ concerns as promised in their rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers unanimously agree that the paper addresses an interesting topic with a novel method. The authors are strongly encouraged to address the points raised before rebuttal, in the final paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall interesting paper with mixed reviews. Majority were positive and authors addressed properly the issues raised by R3. I therefore would consider this paper as in intersting contribution to MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



back to top