Authors

Cosmin I. Bercea, Daniel Rueckert, Julia A. Schnabel

Abstract

Detecting abnormal findings in medical images is a critical task that enables timely diagnoses, effective screening, and urgent case prioritization. Autoencoders (AEs) have emerged as a popular choice for anomaly detection and have achieved state-of-the-art (SOTA) performance in detecting pathology. However, their effectiveness is often hindered by the assumption that the learned manifold only contains information that is important for describing samples within the training distribution. In this work, we challenge this assumption and investigate what AEs actually learn when they are posed to solve anomaly detection tasks. We have found that standard, variational, and recent adversarial AEs are generally not well-suited for pathology detection tasks where the distributions of normal and abnormal strongly overlap. In this work, we propose MorphAEus, novel deformable AEs to produce pseudo-healthy reconstructions refined by estimated dense deformation fields. Our approach improves the learned representations, leading to more accurate reconstructions, reduced false positives, and precise localization of pathology. We extensively validate our method on two public datasets and demonstrate SOTA performance in detecting pneumonia and COVID-19. Code: \url{https://github.com/ci-ber/MorphAEus}

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_30

SharedIt: https://rdcu.be/dnwG8

Link to the code repository

https://github.com/ci-ber/MorphAEus

Link to the dataset(s)

https://bimcv.cipf.es/bimcv-projects/padchest/

https://www.rsna.org/education/ai-resources-and-training/ai-image-challenge/rsna-pneumonia-detection-challenge-2018

Reviews

Review #1

Please describe the contribution of the paper

Autoencoders (AEs) are often used for detecting anomalies in medical images. However, their effectiveness is often hindered depending on data distribution. The paper proposes MorphAEus, a deformable AE approach for improving learned representations, leading to more precise localization of pathology. The deformation module in MorphAEus decouples local and global structural reconstruction, thereby the reconstruction error occurs mainly on unseen abnormal structures rather than complicated normal structures. The proposed approach demonstrates state-of-the-art performance in detecting pneumonia and COVID-19, as validated on two public datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The literature review is comprehensive and provides a thorough analysis of relevant studies
- Autoencoder is most widely used anomaly detection model yet their known limitation of reconstructing unseen structure is not solved. The paper tackles the problem and validates its effectiveness with sizable performance improvement.
- The illustration of examples are adequate. Figure1 highlights the main advantage of the proposed method. Figure2 illustrates the limitations of autoencoders. Figure 4 shows an example of how the proposed method is different from the previous ones.
- The ablation study shows the importance of each design choices.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
There are some clarity issues.
- Based on the figures, the morphed reconstructed output seems to be used for inference. However, it is not explicitly stated whether the morphed reconstructed output or the original reconstruction output of the Autoencoder is used for inference
- There is little information about the deformation. It is unclear how the deformation is applied, which type of transformation is used, and what input is provided to the Spatial Transformer Network module for estimating transformation parameters. The authors could provide additional information to address these concerns.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although the paper does not address important implementation details, the paper contains GitHub project repository link.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Add the description how the proposed method works in an inference stage.
- Add more implementation description or reference how the deformer is designed.
- Although there are examples that demonstrate how MorphAEus behaves differently from other AE-based methods, it may not be immediately apparent why it is more effective at reconstructing in-distribution (ID) samples, even when the input is out-of-distribution (OoD). In my view, during the training, the reconstruction loss captures abstract semantic information from normal samples, while the morph loss captures structural details that cannot be learned by the reconstruction loss alone, such as complex structures or sharp boundaries. One question that arises is whether it is necessary to train the AE and deformation module jointly. Can these two distinct roles be achieved when they are decoupled during training, or is joint training necessary to achieve such performance? Analyzing the reciprocal effect of the AE and deformer would be an interesting study.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes an AE based anomaly localization method. While AEs have been widely applied for anomaly detection and localization in medical applications, their limitation in reconstructing out-of-distribution (OoD) samples remains a challenge. The paper explores the issue with adequate examples and analysis and presents transformation based approach that demonstrates notable performance improvement.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper addresses the problem of anomaly detection in medical images where distributions of normal and pathological images might overlap considerably. First, a comparison of SOTA autoencoders (AE) on chest Xrays highlights its limitations : poorly constraint latent space allowing the reconstruction of anomalous patterns, and limited decoded capacity making normal images reconstruction not sufficiently accurate. Second, the authors introduce two properties to improve on existing AE: minimality and sufficiency, and they propose deformable AEs where minimality is addressed by adding a perceptual loss term to the reconstruction loss, and where sufficiency is addressed by the means of a local morphometric loss that quantifies the local matching of the reconstructed and input images using shared encode-decoder features. Experiments demonstrates the superiority of the proposed MorphAEus in terms of area under the ROC curve for the detection of pneumonia and covid-19.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- the paper provides valuable insights on what auto-encoders learn. Particularly Fig 2 illustrates very well the continuum between the capabilities of an AE to copy the input and to learn the training distribution by heart, depending on the depth of the model.
- an original local morphometric adaptation inspired from learning-based non-rigid image registration and deformation modeling. The proposed method allows for a better reconstruction
- a strong evaluation of the proposed method and SOTA on several public datasets and a rigorous ablation study demonstrating the importance of both the perceptual loss and the morphometric loss.
- parameters selection is detailed in the supplementary material
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- limited discussion on the extendability : the study is conducted only on chest Xrays and two types of anomalies : pneumonia and covid-19. I am curious as to how big is the distributions overlap ? also would it work if the anomaly is actually let’s say a displacement of the anatomy (example a vertebral displacement) ?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide an anonymous repository with all the code and links to the datasets.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
The paper provides an original contribution to the field of anomaly detection, which is of high interest I believe to the MIC community. Judging by the results in terms of pathology detection scores, the approach is promising, surpassing other SOTA AEs, at least on chest Xrays. The paper is well written, simple to follow and well illustrated. In figure 4, we can see that MorphAEus, on the healthy Xray (top row) reconstructs the image without the highly radio-opaque artifact (or device) on the top left of the image. I believe that the corresponding anomaly map would have highlighted it, and would potentially return a false positive in this case. The paper could benefit from a discussion on the impact of such false positive cases. The paper could benefit from a small discussion on the extendability of the method (distributions overlap, type of anomalies). Two minor comments :
- page 5 last paragraph of section 2, the sentence “avoird the reconstruction pathologies” may need to be reviewed
- last sentence of section 4, “the average pathology detection increases from 66.8 to 80.2” not sure where the 80.2 comes from
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper provides an original method for anomaly detection to address the limitations of SOTA illustrated as well in the paper. The evaluation is strong. The manuscript is well written and simple to follow. I believe it has a clear value to the MIC community.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors have highlighted the limitations of autoencoder (AE) for anomaly detection, where out-of-distribution samples can still be well-reconstructed despite the model being trained only on in-distribution samples. Through extensive experiments, they have empirically demonstrated that even state-of-the-art AEs suffer from this limitation. To address this issue, the authors propose a novel AE model that incorporates Perceptual loss and Deformation techniques. This approach effectively reconstructs pathology samples as pseudo-healthy samples and localizes pathologies based on anomaly scores. Overall, the proposed model offers a promising solution for improving AE for anomaly detection.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method is expected to be effectively applied for building an anomaly detection system in the medical domain, especially in scenarios where large-scale collections of normal samples are feasible but collecting annotated samples can be challenging and costly. Additionally, the code for the proposed method is already available. In addition, the authors’ focus on the two properties of minimality and sufficiency, which are essential for AEs to detect anomalies, is presented with an interesting empirical observation (shown in Fig. 2).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

However, the limitation of AE highlighted in this paper has already been previously identified in the literature. Therefore, in order to accurately evaluate the proposed method, it is necessary to compare it with existing solutions (e.g., [1-2]) that have been proposed to address the same limitation. Unfortunately, the competing methods used in this paper do not appear to have been designed to address the limitation of AE that is highlighted. Furthermore, while the proposed method’s use of deformation is critical in reconstructing pathology samples as pseudo-healthy samples, the paper lacks clear and detailed explanations of the motivation, design rationale, and technical implementation of the deformation component. This lack of details may make it difficult for other researchers to reproduce and extend the findings of the paper.

[1] Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection, Conference on Computer Vision and Pattern Recognition. 2019. [2] Autoencoding under normalization constraints, International Conference on Machine Learning. 2021.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors already have a source code that seems ready to be released, but implementation details are still missing from the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. Deformation details: a) Can the authors provide more details about the shared encoder-decoder architecture? b) If the shared encoder-decoder is a separate autoencoder, it seems that the left-hand side of the full objective equation is missing two parameters (theta_S and Phi_S). Can the authors clarify this point? c) Can the authors provide more details about the deformation process, including how the deformation maps are estimated, how they guide the training process, and how they can be beneficial at inference time? d) Can the authors explain why LNCC is used in Morphometric adaptation instead of MSE? e) In Equation (2), is it x or x_rec that is compared to x_morph in the LNCC term? This seems to be inconsistent with Figure 3.
2. Dataset and implementation details: a) Can the authors describe how they split the data into train and test sets? b) Were anomaly detection in healthy samples also evaluated? Does the proposed method measure anomaly scores low in the entire area of healthy samples? c) Can the authors explain how the values in the “Healthy” column in Table 1 and the “Pathology” row in Figure 5(a) were measured? d) How was the “Covid-19 radiography database on Kaggle” used in this paper? e) Can the authors provide information about the architecture of the proposed autoencoder, learning parameters, and performance metrics used in the experiments?
3. Generalized content: While this paper addresses the specific limitations of AE for anomaly detection, the sentence in line 3 of the Background section and the title of Section 2.1 seem to be too generalized.
  - Background - line 3: “This section aims to discuss the common assumption of unsupervised anomaly detection in general”
  - Title of Section 2.1: “Unsupervised Anomaly Detection: Assumptions” Can the authors consider revising these to more accurately reflect the paper’s contributions?
4. Confusing sentences: Can the authors clarify whether the assumption mentioned in the last paragraph of Section 2.1 is introduced by the authors? If not, it would need a reference, because it seems to be a different assumption from the one highlighted in this paper. Note that it cannot say that poor reconstruction of unseen data during model training is similar to the reconstruction of training data in general.
5. Further questions: a) Healthy images in CXR seem somewhat standardized. If there are several types of healthy CXR images, how the proposed method can handle them? b) Can the authors provide more details on the reason for the trend in Figure 2 and the experiment environments where these results were obtained?
  The reviewer is concerned that there might be others influencing these results, such as different sizes of bottleneck dimension, overfitting, or underfitting issue.
6. Minor comments: a) In Section 2.2-line 5, it is suggested that theta might have been missed for the embedding z. b) In Page 6-line 4 and line 6, it is suggested that psi might have been missed for Phi. c) In Equation 2, it is suggested that Phi in the rightmost term should be replaced with psi. d) In Table 1, it is not clear what “Avg” means. If it means the average of Pneumonia and Covid-19, it would be helpful to arrange it in the rightmost column to avoid confusion for readers. e) It is noted that there is a contradiction between the sentences in Figure 4-caption and Results-line1.
  - Figure 4-caption: “Only MorphAEus yields pseudo-healthy reconstructions”
  - Results-line1: “adversarially-trained AEs can reconstruct pseudo-healthy images from abnormal samples” f) In the last line of Section 4, it is suggested that the wrong numbers might have been used.
  - “Thereby, the average pathology detection increases from x to y,”
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Not enough descriptions for the proposed method.
- Due to missing comparisons and discussions with competitors dealing with the same assumption in the literature, it’s not possible to evaluate fair the proposed method.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

While Autoencoder has been widely applied for anomaly detection, the limitation in reconstructing out-of-distribution samples remains a challenge. This paper tackles the limitation of Autoencoder on reconstructing the unseen structure and experimental results demonstrate significant improvement. Two out of three reviewers recommended accept and one was weak reject. All reviewers confirmed the merits on addressing the limitation of autoencoder and clear performance improvements. The major concerns include no enough description of the method and missing experimental comparisons. Overall, this a good paper tackling a challenging problem and the proposed method is interesting. Thus, a decision of Provisional Accept is recommended. But the concerns from reviewers should be addressed thoroughly in the final version.

Author Feedback

We appreciate the reviewers’ positive feedback on our manuscript and their recognition of the originality and valuable insights provided by our proposed work. We are grateful for their acknowledgment of our extensive evaluation, rigorous ablation studies, and the improvements our approach offers over the state-of-the-art (SOTA) methods while addressing auto-encoder limitations.

In our work, we have included seven baseline implementations of auto-encoders, covering various popular choices. These baselines encompass traditional approaches like VAE, as well as advanced techniques that aim to constrain the latent manifold through the disentanglement of latent factors (such as beta-VAE). Additionally, we have incorporated more recent developments that demonstrate exceptional generation accuracy while enforcing constraints on the learned distribution using adversarial losses (such as AAE and SI-VAE). We thank the reviewers for their additional suggestions, which were not included due to space constraints but will greatly add to the extensions of this work.

We appreciate the reviewers’ interest in additional details about our method. Due to space constraints, we couldn’t include all the requested information in the paper. However, the provided code will help address implementation and architectural questions. We have carefully considered all the suggestions to improve the clarity of our work.

We plan to extend our research in various directions. Firstly, we will explore the synergistic effect of jointly training the AE and the deformer, as indicated by promising initial experiments on different modalities. Secondly, we aim to conduct a thorough analysis of false positive detections. Lastly, we are excited about expanding our work to different anomaly types and modalities, building upon promising early experiments. We appreciate the valuable input from the reviewers and will incorporate their suggestions into our future research.

We appreciate the reviewers’ valuable feedback and insightful suggestions, which have strengthened our manuscript and inspired us for future extensions. We are grateful for this constructive dialogue and remain committed to incorporating the reviewers’ comments to enhance the clarity and comprehensiveness of our manuscript.

back to top

What Do AEs Learn? Challenging Common Assumptions in Unsupervised Anomaly Detection