Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Silvia D. Almeida, Carsten T. Lüth, Tobias Norajitra, Tassilo Wald, Marco Nolden, Paul F. Jäger, Claus P. Heussel, Jürgen Biederer, Oliver Weinheimer, Klaus H. Maier-Hein

Abstract

Classification of heterogeneous diseases is challenging due to their complexity, variability of symptoms and imaging findings. Chronic Obstructive Pulmonary Disease (COPD) is a prime example, being underdiagnosed despite being the third leading cause of death. Its sparse, diffuse and heterogeneous appearance on computed tomography challenges supervised binary classification. We reformulate COPD binary classification as an anomaly detection task, proposing cOOpD: heterogeneous pathological regions are detected as Out-of-Distribution (OOD) from normal homogeneous lung regions. To this end, we learn representations of unlabeled lung regions employing a self-supervised contrastive pretext model, potentially capturing specific characteristics of diseased and healthy unlabeled regions. A generative model then learns the distribution of healthy representations and identifies abnormalities (stemming from COPD) as deviations. Patient-level scores are obtained by aggregating region OOD scores. We show that cOOpD achieves the best performance on two public datasets, with an increase of 8.2% and 7.7% in terms of AUROC compared to the previous supervised state-of-the-art. Additionally, cOOpD yields well-interpretable spatial anomaly maps and patient-level scores which we show to be of additional value in identifying individuals in the early stage of progression. Experiments in artificially designed real-world prevalence settings further support that anomaly detection is a powerful way of tackling COPD classification. Code is at https://github.com/MIC-DKFZ/cOOpD.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_4

SharedIt: https://rdcu.be/dnwGH

Link to the code repository

https://github.com/MIC-DKFZ/cOOpD

Link to the dataset(s)

https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?view_pdf&stacc=phs000951.v5.p5

http://www.asconet.net/html/cosyconet/projects


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes to reformulate Chronic Obstructive Pulmonary Disease (COPD) prediction using CT images as an anomaly task, which is done using a generative model operating on the self-supervised representation space of the images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A large dataset
    • Comprehensive evaluation with comparison to state-of-the-art
    • Significant improvement on the performance compared to state-of-the-art
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The writing could include more technical details. For example, although the method used in section 2.1 is based on a well-known algorithm, a minimum explanation in this paper is necessary to make it easier to read and follow.
    • It is not clear what Fig 1.a demonstrates.
    • The idea is not entirely new as claimed in the paper. There is previous work on converting supervised task of covid-19 detection using CT into an anomaly detection task, e.g., https://www.nature.com/articles/s41598-021-87994-2
    • Minor: cOOpD abbreviation is not clear.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems reproducible; the data is opensource and the code will be make public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper’s writing could improve by taking the approach to add some details about the core methodology, to make it easier to read and follow.

    The evaluation is comprehensive, and illustrates the proposed pipeline is able to produce reliable results.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Important application, with comprehensive evaluation.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The classification of heterogeneous diseases like COPD is challenging due to their complex and varied nature, making it difficult to diagnose. The authors propose a novel approach to reformulate COPD binary classification as an anomaly detection task, using a self-supervised contrastive pretext model to learn representations of unlabeled lung regions. A generative model then learns the distribution of healthy representations and identifies abnormalities (stemming from COPD) as deviations. Patient-level scores are obtained by aggregating region OOD scores, and the authors show that this approach achieves better performance on two public datasets than the previous supervised state-of-the-art. The method also yields well-interpretable spatial anomaly maps and patient-level scores, which can be useful in identifying individuals in the early stage of progression.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a novel method that reformulates the COPD classification task as a anomaly detection problem. The authors propose a self-supervised learning method for this task. The proposed model outperforms state-of-the-art methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Not sufficient details are provided in the Method section. It is impossible to reproduce the experimental results.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experimental results cannot be replicated solely from the paper’s description.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The idea of reformulating the COPD classification task as a anomaly detection problem novel. The proposed method that solves this problem under the self-supervised learning framework seems reasonable. Experimental results show that the proposed method outperforms state-of-the-art methods.

    The Method section lacks sufficient details to replicate the experimental results. Specifically, the authors did not provide the architecture of the encoder and Normalizing Flow in Section 2. As the proposed method is an anomaly detection method, it is assumed that only healthy (negative) samples were used during training, but this remains unclear in Section 3. The proposed method provides a negative log-likelihood score for each subject, but the threshold for positive and negative classes is indeterminate. It would be helpful for the authors to clarify this point.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of reformulating the COPD classification task as a anomaly detection problem novel. The proposed method that solves this problem under the self-supervised learning framework seems reasonable. Experimental results show that the proposed method outperforms state-of-the-art methods.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This study developed a deep learning model (cOOpD) that uses CT images to detect healthy distributions of lung tissue and lung abnormalities. Inspiratory and expiratory CT images from the COPDGene cohort were used to train the model with extracted overlapping patch regions. The models were then validated using an external cohort, COSYNET. Compared to previously published models, the cOOpD model obtained higher AUROC performance in the training and validation dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    An important strength of this study is they utilized two independent and large COPD cohort studies. The comparison to previously published methods is also a strength.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some additional details in the methods section would improve clarity.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Details regarding the models and algorithms, and datasets used were provided. The code related to this will be made available upon acceptance. Some details related to the reported experimental results were also provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In paragraph 2 in the introduction it is mentioned that ‘intensity and textural-level imaging features from paired inspiration and expiration computed tomography scans (CT)’ were the initial efforts to characterize COPD. It can be argued that low attenuation areas below -950 HU (LAA950) extracted from inspiration CT images is the most common CT imaging feature, which is not extracted from paired inspiration and expiration CT images. This sentence does not seem inclusive of the conventional quantitative CT measurements.

    In paragraph 2 in the introduction it is stated that: “Typically, for a supervised model to learn good decision boundaries, the labeled training dataset needs good coverage of the appearances of all classes.” What is a ‘good decision boundary’ and ‘good coverage’? This is not clear.

    The motivation in the introduction for not using binary class labels is unclear. Limitations about the standard methods for diagnosing COPD should be stated to motivate the use of imaging and deep learning.

    Were PRISm subjects included? These subjects may have increased disease on CT imaging despite normal FEV1/FVC.

    In the ‘BaseLines’ methods section, is it unclear that SotA, PatClass, MIL+RNN, MIL+Att are existing models.

    There is no description of the statistical comparison in the methods section. Please indicate how you compared the performance of the models.

    The AUROC values for the ReContrastive and cOOpD models in the external validation dataset (COSYCONET) are fairly low compared to previously published deep learning models for COPD with binary classification from the literature. Some more discussion on why these values are lower than previous studies have reported is warranted.

    In the discussion section it is stated that “Using both inspiratory and expiratory images provides information about pulmonary vascular alterations and airway wall thickness not visible on the inspiratory scan alone.” Please clarify. Expiratory images are not typically used to quantify airways or vessels.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are some important details missing from the methods section. Further, some more discussion on the comparison to previous models, and the added value of including expiration CT would be helpful.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors formulated the COPD classification as anomaly detection. They use contrastive learning on healthy individuals to create a healthy distribution. They validate their samples on two large-scale studies.

    Strength:

    The reviewers mentioned that the viewing COPD classification as novelty detection is novel.

    Weakness:

    After going over the paper, I have some serious concerns about the quality of the paper:

    • The papers start with the fact that COPD is heterogeneous. However, the paper does not address the heterogeneity of COPD, and it aims at classifying COPD. My main concern is about the applicability/usability of the main method. To diagnose COPD, one does not need a CT image. The main diagnosis is made using Forced Expiratory Volume in 1 sec (FEV1), which does not involve radiation (CT imaging). Why using CT image for diagnosing COPD is even needed? When do you think this method be useful if we can simply use FEV1? The only scenario that comes to my mind is the incidental finding from low-dose CT where the patient’s respiratory state during the scan (i.e., patient holding breath) is not as well controlled as COPDGene. But there is no such experiment in the paper. There is no experiment in the paper trying to predict FEV1 to see how well that preserves that information.

    • The paper does not compare any Self Supervised method. There are many SSL methods developed in the community.

    • This is pointed out by all reviewers: The paper lacks detail on the method and experiment section.

      • “There is no description of the statistical comparison in the methods section. “
      • “Using both inspiratory and expiratory images provides information about pulmonary vascular alterations and airway wall thickness not visible on the inspiratory scan alone.”
      • See the comments by the reviewers




Author Feedback

Dear AC & Reviewers,

We are very excited all three reviewers recommend accepting our work for publication. The AC had concerns which overturned the collective recommendation for acceptance, so we will primarily address them.

The AC’s main concern was about the applicability/usability of the method, stating that “To diagnose COPD, one does not need a CT image”. In fact, Lowe et al. 2019 redefined the COPD diagnosis to include chest CT which is now officially recommended by GOLD 2023 for cases of persistent exacerbations, symptoms out of proportion to disease severity on lung function testing, FEV1<45% with significant hyperinflation and gas trapping, or for those who meet criteria for lung cancer screening. Apart from the emerging clinical role of CT, countless research works have studied CT quantitative biomarkers ([4], Kahnert K et al. 2023), which may complement classical diagnostic methods. As mentioned by R3, classic diagnostic methods are limited (Andreeva E et al. 2017) and we aim to motivate this better in the introduction. By providing a global quantification of CT findings, our approach may be of clinical value for follow up and therapy monitoring, complementing classic diagnostic methods rather than replacing them.

Following this idea, the AC also added “There is no experiment in the paper trying to predict FEV1 to see how well that preserves that information”. Previous research has shown that imaging derived parameters to some extent predict FEV1 (Park H et al. 2023), but also provide clinically useful information beyond spirometry [18]. Although we don’t directly predict FEV1, this regression can be extrapolated from Fig.2b, where the relation between the anomaly score and GOLD (derived from FEV1) is depicted. Our future work will focus on the further validation of our proposed method for clinical use, which we believe is out-of-scope for this work.

The AC also stated that “the paper does not address the heterogeneity of COPD”. We disagree, as we provide well-interpretable anomaly maps (fig.2c) tackling the spatial heterogeneity, where diseased regions seem to overlap with higher anomaly scores. Further clinical work will focus on a thorough evaluation of this important feature.

The AC suggested a further comparison of our pretext task (SimCLR) to other self-supervised learning (SSL) methods. However, our contribution was to move from supervised classification to anomaly detection. Therefore, comparisons were done to SOTA supervised methods, which all reviewers highlighted as comprehensive. We agree other SSL methods could be compared. However, as they are not necessary to support the claims we make, we leave this for future work.

Although R1&3 stated that the paper is easily reproducible, as details on models were provided, datasets are public and code will be released upon acceptance, some additional details on the core methodology were suggested as to improve clarity. We will add information on the Normalizing Flow whose implementation is identical to [15], based on RealNVP.

Some clinical questions were pointed by R3: *As described in the methods, we focused on never-smoker controls and GOLD 0-4 individuals, therefore PRISm subjects were not included in this study. *Regarding the question about information on airways and vessels from expiratory CT, we will discuss recent work in the discussion on gas trapping quantified on full expiration CT images (LAA-856%) as surrogate markers for small airway inflammation (Gawlitza J et al. 2018; Cao X et al. 2021). *Misleading wording (quantitative metrics use “either“ not “paired(…)scans“) will be clarified in the revised version.

As R3 noticed, the description of the statistical comparison was missing in the methods. This will be added to the revised version (paired samples t-test).

We hope the subsequent discussion will ensure a fair and balanced consideration of all perspectives. Thank you for allowing us to address the concerns raised.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After Rebuttal

    I agree with the authors that CT imaging has clinical value and can provide insight into the disease. However, due to radiation risk, the clinical use of CT for COPD diagnosis is limited to incidental finding where the CT imaging parameters is much more challenging (low dose, less control on respiratory state, various reconstruction kernel), which is not explored.

    Also, I disagree with the authors that the prediction of the GOLD score can be extrapolated to the predictability power of the FEV1. The GOLD score is quite a course and arguably disputed metric for disease severity. The lack of comparison with the SSL methods further reduces my enthusiasm about the results.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Proposes a technique for COPD classification, formulated as anomaly detection. Strengths is the novelty in problem formulation, reasonable validation. Major concern from MR was with regards to clinical applicability, which appears to have been better explained in the rebuttal. This text should probably be communicated into the paper as well. The question about relationship with FEV1 is not well explained even in the rebuttal. The authors have indicated that additional missing information will be included in the final version. Overall, the paper appears to merit acceptance based on the rebuttal and clarifications provided.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I tend to agree with the comments made by meta-reviewer #1. The value of CT in the diagnosis of COPD is not sufficiently clear, with a potentially more promising scenario is the low-dose CT lung nodule screening. However, the experiments did not address this aspect. If the model is intended for screening purposes, it may pose problems because the model was only trained on COPD and healthy cases, while real screening scenarios involve various other complex lung diseases such as pneumonia and lung cancer, and the model’s performance in such cases is uncertain. Furthermore, regarding the methodology, formulating COPD classification as an out-of-distribution (OOD) detection problem may not be a truly practical approach. It may not be possible that the performance of OOD-based technique can surpass that of fully supervised learning models, which are considered more practical in medical imaging. Additionally, both R1 and R2 mentioned that the paper outperforms the state-of-the-art. However, Table 1 only compares four methods, and only one of them [21] is specifically related to COPD detection. Moreover, [21] included low-dose CT data and reported AUC values ranging from 0.866 to 0.934, which are higher than the AUC of 0.658 reported in this paper for [21]. Furthermore, [5] is an application of pathological images.



back to top