Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Behzad Bozorgtabar, Dwarikanath Mahapatra, Jean-Philippe Thiran

Abstract

Unsupervised anomaly detection in medical images such as chest radiographs is stepping into the spotlight as it mitigates the scarcity of the labor-intensive and costly expert annotation of anomaly data. However, nearly all existing methods are formulated as a one-class classification trained only on representations from the normal class and discard a potentially significant portion of the unlabeled data. This paper focuses on a more practical setting, dual distribution anomaly detection for chest X-rays, using the entire training data, including both normal and unlabeled images. Inspired by a modern self-supervised vision transformer model trained using partial image inputs to reconstruct missing image regions– we propose AMAE, a two-stage algorithm for adaptation of the pre-trained masked autoencoder (MAE). Starting from MAE initialization, AMAE first creates synthetic anomalies from only normal training images and trains a lightweight classifier on frozen transformer features. Subsequently, we propose an adaptation strategy to leverage unlabeled images containing anomalies. The adaptation scheme is accomplished by assigning pseudo-labels to unlabeled images and using two separate MAE based modules to model the normative and anomalous distributions of pseudo-labeled images. The effectiveness of the proposed adaptation strategy is evaluated with different anomaly ratios in an unlabeled training set. AMAE leads to consistent performance gains over competing self-supervised and dual distribution anomaly detection methods, setting the new state-of-the-art on three public chest X-ray benchmarks - RSNA, NIH-CXR, and VinDr-CXR.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_19

SharedIt: https://rdcu.be/dnwch

Link to the code repository

N/A

Link to the dataset(s)

https://www.kaggle.com/c/rsna-pneumonia-detection-challenge

https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection

https://nihcc.app.box.com/v/ChestXray-NIHCC/file/37164782321


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes to basically extend the DDAD model (or the very similar teacher-student methods) with a ViT, pretrained as MAE, and a proxy anomaly classification and bootstrapping part.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Nice idea/ realistic setting of healthy and completely unlabeled training set.
    • Novel Idea to use bootstrapping with artificial anomalies and then use a completely unlabeled and mixed dataset to find anomalous and healthy images on which a ‘teacher-student’-like MAE ViT can be trained
    • Extensive number of baselines + reasonable number of datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I feel the main weakness of the paper is its writing style (which imo is quite confusion and without knowledge of prior work I probably would be lost here) and the not clear main message (on one hand the paper proposed the “mixed” healthy+ unlabeled training data set and on the other the methological improvements over DDAD).
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Most concepts should be quite easily reproducible. However, since this work uses a pretrained model, multiple stages and no training code will be made available I feel like this would hinder the reproducibility quite a lot.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I find it hard to pin point an exact point to improve the readability of this paper, but a better restructuring of 2. would greatly improve the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think overall it is a nice paper, with nice incremental improvements. But the readability and clarity of the paper makes it not really accessible to a reader without a lot of prior knowledge and one has to re-read the paper multiple times to get the actual steps (even working in this field of research).

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I feel like the authors tried to address the issue, but improving the readability would likely need more than that… . However, I still feel like the work it self can be accepted (and also reading the other reviews) and thus stick with a weak accept.



Review #2

  • Please describe the contribution of the paper

    This paper presents a self-supervised method for anomaly detection in medical images by leveraging a pre-trained Masked Autoencoder and proposing adaptation strategies to address the challenge of limited labeled data. Experimental results on popular medical image datasets demonstrate the effectiveness of the proposed method, outperforming several state-of-the-art approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is well-organized and easy to follow.
    2. The proposed method is clear and reasonable.
    3. Experiments are conducted on several popular anomaly detection benchmarks.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the proposed method is effective in identifying anomalies, it would be beneficial to extend it for anomaly segmentation as well, especially when it comes to medical diagnosis where localization is crucial. A comparison with state-of-the-art anomaly localization methods could be insightful.
    2. While the use of self-supervised pretraining is commendable, the proposed Self-Supervised Mask-Reconstruct method (Sec. 2.2 (1)) may seem duplicated since the pretrained MAE from [27] has already learned masked reconstruction for both normal and abnormal X-ray images.
    3. The use of a deeper backbone in the proposed method may not provide a fair comparison with the baseline DDAD [1] that uses a CNN-based architecture. To address this, the authors should replace the CNN-based architecture in DDAD [1] with the same pretrained transformer and report the results to make the comparison fair.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It seems reproducible as the authors have provided many details of the proposed methods. If the authors promise the release their codes, I will consider increase my score.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See the weakness.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See the weakness.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    As the rebuttal has partly addressed my main concerns, I will keep my original rating and vote for accepting this paper.



Review #3

  • Please describe the contribution of the paper
    1. This paper presents two-stage adaptation algorithms with pre-trained masked autoencoder for dual distribution anomaly detection in CXRs.
    2. The proposed method is capable of more effectively apprehending anomalous features from unlabeled images.
    3. Experiments on the three CXR benchmarks demonstrate that AMAE is generalizable to different model architectures, achieving SOTA performance.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. well-written
    2. well-motivated
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. For those readers unfamiliar with AnatPaste, the description of stage-1 is still not clear enough, especially for the proxy task. Therefore, Fig.1 (in the Supplementary Material) is more necessary to be placed in the manuscript.
    2. The handcrafted threshold-based values are used for probability-oriented confidence assessment and for subsequent pseudo-label assignment, which is a rough step.
    3. The mean maps in Eq.3 are ambiguous and has not been explained by the authors.
    4. Table 1 does not prove that ‘AMAE-Stage 1 achieves SOTA results on two CXR benchmarks, demonstrating the effectiveness of pre-trained ViT using MAE and synthetic anomalies.’ A lot of important information is not described in the manuscript, including the backbone used by the SOTA methods listed in the table, whether pre-training initialization is used, and whether the pre-training weights used are consistent. Therefore the above comparison is biased and does not seem to be fair, and it does not prove the above points. The same is true for stage 2.
    5. The ablation experiment is incomplete. Comparison of partial ablation with the overall, stage-1 and/or stage-2 respective internal components, and the associated quantitative reports are missing.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Have no idea.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The contributions are not emphased in the section 1.
    2. There is no explanation of the basis for choosing the 75% masking ratio. (Sorry for my omission, I found the relevant experiment in the appendix, but still suggest mentioning an index of this experiment in the manuscript.)
    3. Fig. 2 is hard to read.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please turn to the WEAKNESSES.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a self-supervised method for anomaly detection in medical images by leveraging a pre-trained Masked Autoencoder. We have received mixed reviewer comments with two reviewers suggesting weak accept and one weak reject. While reviewers confirmed the merits of the paper on method design and good performance, they also raised major concerns including poor readability and clarity, justification of method design, fair comparison with existing studies, incomplete ablation studies, etc. Therefore, a decision of Invite for Rebuttal is recommended for the authors to address the reviewers’ comments.




Author Feedback

We thank all reviewers for their insightful feedback. We feel encouraged as our contributions are recognized to have novel and realistic settings. Please see our responses below:

R1 [writing style & reproducibility] We greatly appreciate your constructive comment. Our main contribution is proposing a two-stage algorithm for adapting pre-trained Masked AutoEncoder to leverage readily available unlabeled data. Unlike DDAD, which treats all unlabeled images similarly, we propose assigning pseudo-labels to unlabeled images and formulating anomaly detection by measuring the distribution discrepancy between normal and pseudo-labeled abnormal images from unlabeled sets. To improve readability, we will move Fig.1 from the Supp. to the final manuscript and add description for Stage 1. We will also provide all the training details, and adaptation, e.g., hyperparamets, to improve reproducibility.

R2 [Extension to anomaly localization] Indeed, we first obtain pixel-level anomaly scores (Eq.3), and the image-level score is obtained by averaging the pixel-level scores in each image. We do not report pixel-level metrics, as we only have very few (less than 10% of the test set) NIH samples with rough bounding boxes annotation, but we will add qualitative localization results to the revised paper if the page-limit allowing it.

[(Sec. 2.2 (1))] What a great suggestion! Although Stage 2 (Sec. 2.2 (1)) slightly improves the performance upon pre-trained MAE from Stage 1 due to a slight distributional shift between source and target data, we agree that we can only emphasize (Sec. 2.2 (2)).

[The use of a deeper backbone] In Fig.2 (c), we ablate our method using a pre-trained CNN architecture (DenseNet-121), achieving higher performance (84.5% AUC) than DDAD [1] on the RSNA dataset. Following suggestion, we will also report DDAD by replacing its backbone with the same pre-trained transformer in the revised paper if space allows.

R3 [AnatPaste] What wonderful suggestions! Following suggestion, we will move Fig.1 from the Supp. to the final manuscript and improve readability w.r.t the proxy task.

[The handcrafted threshold] Since output probability distributions differ by class, instead of a single static threshold, we select a confidence threshold per class with the reasonable choice of top K-th percentile (K=50) of confidence values for each class.

[The mean maps in Eq.3] In Fig.1 (bottom), we illustrate the reconstructions from two MAE modules at the test stage, which are then averaged to form mean maps. We will add this explanation to the revised paper.

[AMAE-Stage 1] The results in Table 2 show that AMAE-Stage 1, pretrained using unlabeled chest X-rays or pretrained either on ImageNet (“IN”) or chest X-rays surpasses competing methods whether pre-trained using “IN”, end-to-end training (“e2e”), or for the rest that trained-from-scratch on the NIH-CXR, and RSNA dataset. The ablations in Fig. 2 (c) also indicate the superiority of AMAE-Stage 1 over baselines for different architectures (DenseNet-121 or ViT), pretraining schemes (MoCo v2 or MAE) on the RSNA dataset. We will refine and improve the description for both stages, and following suggestion, we will add detailed methods’ backbones by adding column to Table 2.

[ablation experiment] We believe we already included all ablations in Fig. 2 (a), e.g., Stage 2 (A_inter) referred as overall model, Stage 2 (Mask-Rec.), and Stage 1. Stage 1 does not have internal components as it does not utilize unlabeled images. Other ablations for incorporating pseudo labels and sensitivity to pretraining and backbone have been included in Fig. 2 (b,c). Nonetheless., we will be glad if the reviewer points us to the missing ablation to add to the table, as we did not fully understand what is missing!

[Other suggestion] We will emphasize the contributions in section 1 and improve the readability of Fig. 2. We already added this ablation to Table 1 in Supp. for the masking ratio, which will be cited.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has partially addressed the reviewers’ concerns. Reviewers confirmed the merits of methodology design and improved performance, while the readability should be greatly enhanced and more implementation details should be added in the final version. Overall, these issues could be addressed properly in the final version.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper studied the problem of anomaly detection in chest X-rays and designed an MAE-based method that can adapt to anomaly detection from normal and unlabeled images. The method is evaluated on three CXR datasets. The rebuttal provided additional details about the method, such as different backbones, thresholds, and results. The remaining major concern is the novelty/difference compared with the existing method. For example, based on the description of “Self-Supervised Mask-Reconstruct”, it is difficult to find the differences between this part and MAE/SimMIM, etc.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although some concerns remain, the authors diligently addressed the reviewers’ concerns – I feel publishable MICCAI



back to top