Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yu Cai, Hao Chen, Xin Yang, Yu Zhou, Kwang-Ting Cheng

Abstract

Chest X-ray (CXR) is the most typical radiological exam for diagnosis of various diseases. Due to the expensive and time-consuming annotations, detecting anomalies in CXRs in an unsupervised fashion is very promising. However, almost all of the existing methods consider anomaly detection as a one-class classification (OCC) problem. They model the distribution of only known normal images during training and identify the samples not conforming to normal profile as anomalies in the testing phase. A large number of unlabeled images containing anomalies are thus ignored in the training phase, although they are easy to obtain in clinical practice. In this paper, we propose a novel strategy, Dual-distribution Discrepancy for Anomaly Detection (DDAD), utilizing both known normal images and unlabeled images. The proposed method consists of two modules. During training, one module takes both known normal and unlabeled images as inputs, capturing anomalous features from unlabeled images in some way, while the other one models the distribution of only known normal images. Subsequently, inter-discrepancy between the two modules, and intra-discrepancy inside the module that is trained on only normal images are designed as anomaly scores to indicate anomalies. Experiments on three CXR datasets demonstrate that the proposed DDAD achieves consistent, significant gains and outperforms state-of-the-art methods. Code is available at https://github.com/caiyu6666/DDAD.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_56

SharedIt: https://rdcu.be/cVRuH

Link to the code repository

https://github.com/caiyu6666/DDAD

Link to the dataset(s)

https://www.kaggle.com/c/rsna-pneumonia-detection-challenge

https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection


Reviews

Review #1

  • Please describe the contribution of the paper

    In this manuscript, the author proposed a dual-distribution discrepancy for anomaly detection. This paper is the first work that includes the unlabeled normal and abnormal images in training to improve abnormal detection. The abnormality is evaluated by two evaluation metrics: intra- and inter-discrepancy. The experiments on two benchmarks show state-of-the-art results and observed the increasing AUC by including more abnormal data in training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This is the first work to include data with unlabeled abnormal and normal images.
    2. A dual-distribution method is proposed to learn the inter-and intro- discrepancy between the two reconstructed images: from the network trained with unlabeled image datasets and normal image datasets.
    3. The proposed method achieves state-of-the-art results demonstrating the effectiveness of involving abnormal data in uncertainty training.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The word ‘label’ is not defined, maybe refer to the label of normal and abnormal? It can be confused because the label can also be defined as the lesion annotation label. Similarly, in the introduction, “To the best of our knowledge, it is the first time that unlabeled images are utilized to improve the performance of anomaly detection”. As all the unsupervised methods use the unlabeled data, ‘unlabeled images’ may confuse.
    2. In training, if model A trained with a pure normal image, will the input of the abnormal image be failed with both normal images generated? If model A trained with all abnormal images, will the normal image be failed to be detected as abnormal images with high inter-discrepancy as one abnormal and the normal image generated?
    3. To compare with the SOTA methods, why the AS intra outperforms the AS inter with such large margin, but opposite scores observed for RSNA dataset?
    4. By training both the normal and abnormal cases, will the robustness be affected by the limited abnormal data the network has seen?

    Minor:

    1. The fonts in Fig. 3 are too small to read.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author claims to release the source code, and the paper provides enough details to reproduce the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The detailed suggestions are integrated to the limitation session: 1) the definition of the unlabeled data 2) the question about the training cases in model A 3) the detailed explanation of the experiments 3) concern about the robustness of this model.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposed novel uncertainty estimation methods by utilizing the unlabeled data for uncertainty and normal cases. This is the first work to do so and provide an alternative way to explore the uncertainty estimation for medical analysis in detection. The proposed methods reach state-of-the-art performance, which is promising and the ablation study shows the performance increased while including more diverse data. A concern about the robustness of this model as the abnormal data is applied for training.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a new strategy method for anomaly detection based on labeled and unlabeled data, which is a novel idea. Experiments show the effectiveness of the method. This will change the traditional way of thinking about anomaly detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper can use unlabeled data and labeled data for anomaly detection, which will greatly improve the efficiency of anomaly detection, and is also very consistent with the actual situation of clinical application data.
    2. The theoretical formula expression is also relatively clear, and the comparison experiment and ablation experiment both demonstrate the effectiveness of the method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The full writing and abbreviations in the thesis should be consistent with each other. There are abbreviations in the front and try to use abbreviations in the back. The full writing and abbreviations should not be repeated.
    2. How K was chosen, please explain.
    3. There are some spelling mistakes, please note, such as fisrt time.
    4. Regarding the selection of training data sets, when comparing with other optimal methods, use data sets that are common to other methods for comparison. Otherwise, the conclusion drawn is very likely to be overfitting. Please find the datasets used by the other best methods listed in the article, and do experiments to compare the results.
    5. The code used in this paper needs to be open-sourced to demonstrate the reproducibility of the method, or to provide relevant proofs.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Given the uncertainty in the choice of datasets, more material is needed to demonstrate the reproducibility of the method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The full writing and abbreviations in the thesis should be consistent with each other. There are abbreviations in the front and try to use abbreviations in the back. The full writing and abbreviations should not be repeated.
    2. How K was chosen, please explain.
    3. There are some spelling mistakes, please note, such as fisrt time.
    4. Regarding the selection of training data sets, when comparing with other optimal methods, use data sets that are common to other methods for comparison. Otherwise, the conclusion drawn is very likely to be overfitting. Please find the datasets used by the other best methods listed in the article, and do experiments to compare the results.
    5. The code used in this paper needs to be open-sourced to demonstrate the reproducibility of the method, or to provide relevant proofs.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is generally good, and has a new perspective on this field. But it’s better to be able to answer my related questions and be accepted for publication.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Previous anomaly detection papers often consider the problem as one-class classification using only normal images. The authors in this work propose to leverage both normal and unlabelled images containing anomalies for training to perform more accurate anomaly detections.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is mostly well-written and clear.
    2. The method achieves good performance for anomaly detection in chest x-rays.
    3. Most of the aspects are described in sufficient detail to enable the reproduction of results.
    4. Authors claim this is the first time that utilizes unlabeled images to improve the performance of anomaly detection.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed setup is similar to noisy-label learning, it would be better if authors can explore some of the baselines from noisy label in this problem.
    2. In practice, collecting a large number of normal images can still be time-consuming, it would be better if the authors can show the performance with fewer normal training images (e.g., hundreds instead of thousands)
    3. In figure 2, the comparison is a bit unfair given that the proposed model contains much more learnable parameters (K AEs for module A and K AEs for module B) than the baseline autoencoder.
    4. It would be better if the authors can compare with some recent SOTA anomaly detectors from 2021.
    5. Some of the references are missing [1,2,3,4]

    Reference: Tian, Yu, et al. “Constrained contrastive distribution learning for unsupervised anomaly detection and localisation in medical images.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021. Dey, Raunak, and Yi Hong. “Asc-net: Adversarial-based selective network for unsupervised anomaly segmentation.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021. Marimont, Sergio Naval, and Giacomo Tarroni. “Implicit field learning for unsupervised anomaly detection in medical images.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021. Chen, Yuanhong, et al. “Deep one-class classification via interpolated gaussian descriptor.” AAAI 2022.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors claim to provide codes based on the reproducibility list.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is well written and novelty can be deemed as sufficient. There are a few issues can be addressed during rebuttal.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and novelty can be deemed as sufficient. Some of the weakness can be addressed. (See above for details)

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    In this paper, the authors propose a simple approach to anomaly classification using a dual network combination in which the first network builds representation based on a mixture of normal and abnormal images, while a second network builds representations for normal images only. Then a discriminator evaluates the difference between the two distributions to conclude abnormality. This type of method has been used in the GAN context (see below) to show classification improvement in a previous work which could be interesting to compare at least rationale-wise. Please also address other reviewer’s comments. There is still lingering question here for this to be a general technique, since the incidence distribution of anomalies in chest X-rays is probably higher than for other anomaly detection cases (e.g. IT logs, anomalies may be rare). In that case, the distribution may be overwhelmingly similar to the one with normal cases only. So this method may work particularly for cases where the incidence distribution of anomalies is a healthy mix with normals (in the unlabeled data used for training).

    https://www.semanticscholar.org/paper/Semi-supervised-learning-with-generative-networks-Madani-Moradi/c80c08c2371ac3f4af5b0ad40721d185fbeae38d

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We thank the reviewers for providing constructive comments on our paper and we summarize several important questions to answer below:

Q1: What’s the performance of the proposed method in extreme situations? For example, anomalies may be rare in some situations (Meta-reviewer), or module A is trained with pure normal/abnormal images (Reviewer #1). A1: As mentioned in Sec. 2.1, our module A is trained on a combination of a normal dataset and an unlabeled dataset. Therefore, it will certainly capture the normal features. As for the unlabeled dataset, the ablation study in Sec. 3.3 has shown our performance with arbitrary AR of the unlabeled dataset. According to the results, if there are rare or no anomalies (i.e. AR=0), our inter-discrepancy can still outperform previous reconstruction method, which can be explained that the inter-discrepancy degenerates to intra-discrepancy in this situation as we mentioned in the paper, while intra-discrepancy only using normal images has been proven effective by our experiments and previous Deep Ensemble.

Q2: About fair comparison: dataset choice (Reviewer #2), number of learnable parameters (Reviewer #3). A2: The paper of another competitive method, AE-U, also uses RSNA dataset, on which our method outperforms them by a large margin. In order to further prove the effectiveness of our method, we conduct a series of experiments on a new chest xray dataset, VinBigData, whose results will be reported in our camera-ready version paper. (Dataset link: https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection) In order to exclude the influence of different number of learnable parameters, we ensemble K models using previous reconstruction method, which has no improvement although it contains the same number of parameters with our method (see Table 1), demonstrating that the improvement of our method entirely due to the proposed intra- and inter- discrepancy rather than more learnable parameters.

Q3: In most situations, AS inter performs better than AS intra, consistent with the conclusion of our ablation study. However, in our private CXA dataset, for AE-U, AS intra outperforms the AS inter by a large margin, which is unusual. (Reviewer #1) A3: We analyzed the samples on CXA dataset, and we found that there are some low quality images (including wrong direction, scanning position offset from chest, or severe distortion), leading to unreliable results on CXA dataset. Therefore, we conduct quality control for our private CXA dataset to exclude low quality images. The obtained new dataset is named CXAD and corresponding experimental results is consistent with our conclusion, which will be updated in our camera-ready paper.

Other updates:

  1. We found that standard deviation and L1 distance is better than variance and L2 distance for our intra- and inter- discrepancy, thus we will update the formulas of anomaly scores and experimental results accordingly.
  2. We will provide the source code and files for datasets repartition soon to ensure reproducibility.
  3. Other minor mistakes or weaknesses mentioned by reviewers will be modified in our camera-ready version paper.



back to top