Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Philip Müller, Felix Meissen, Johannes Brandt, Georgios Kaissis, Daniel Rueckert

Abstract

Pathology detection and delineation enables the automatic interpretation of medical scans such as chest X-rays while providing a high level of explainability to support radiologists in making informed decisions. However, annotating pathology bounding boxes is a time-consuming task such that large public datasets for this purpose are scarce. Current approaches thus use weakly supervised object detection to learn the (rough) localization of pathologies from image-level annotations, which is however limited in performance due to the lack of bounding box supervision. We therefore propose anatomy-driven pathology detection (ADPD), which uses easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies. We study two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels. Our results show that our anatomy-level training approach outperforms weakly supervised methods and fully supervised detection with limited training samples, and our MIL approach is competitive with both baseline approaches, therefore demonstrating the potential of our approach.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_6

SharedIt: https://rdcu.be/dnwb4

Link to the code repository

https://github.com/philip-mueller/adpd

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper addresses a shortcoming in medical imaging ML where there are few datasets that are exhaustively labeled (and, accurately) for the pathology evident in the image. They argue that weakly supervised learning has limited benefit. However, it doesn’t require significant clinical expertise (my interpretation – not explicitly stated) to annotate anatomical regions. The resulting anatomy driven pathology detection technique addresses shortcomings stated earlier. They demonstrate this idea with chest X-ray datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel / interesting idea.
    2. Some evidence of success.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Not sure how the proposed idea would translate to subtle pathologies?
    2. How much clinical expertise is needed for annotation?
    3. What are dataset size limitations?
    4. What challenges might be faced for non-CXR images? CXR
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code provided. Links to data provided. So, in general, it is reproducibility is adequate in the way the method has been described in the paper. However, a link to github repository (even if private at this stage) is not provided. And without that the description isnt sufficient for someone to recreate this work easily – but, it is possible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors have astutely pointed out that image datasets are inadequately annotated. They also recognize that it is tedious for high quality clinical annotations. They propose a lower quality (?) / or less precise anatomical annotation to improve weakly supervised classification. They compare this approach against multiple instance learning.

    Overall the work stands on its own and is quite interesting. But, it leaves too many questions unanswered:

    1. What is the level of expertise needed to annotate anatomy?
    2. What is the amount of detail needed in radiology/clinical reports to associate anatomy as bearing disease?
    3. What are dataset size lower bounds?
    4. How is accuracy measured? (in absence of truth labels/bounding boxes?)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper promises a very interesting idea but leaves many questions unanswered. It describes its method pretty clearly but the evaluation appeared to be a little contrived. In principle the idea is very good, but there are significant gaps and the results are from a non-standard strategy that the authors have adopted.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a new approach called Anatomy-driven Pathology Detection for automatic interpretation of medical scans, particularly chest X-rays, by detecting and delineating pathological regions. The authors address the scarcity of large public datasets for pathology detection by using easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies. They compare two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The results show that the ADPD model outperforms weakly supervised methods and fully supervised detection with limited training samples, demonstrating its potential in pathology detection.

    • The proposed ADPD model provides a promising direction for future research in automatic pathology detection.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The approach depends on region proxies that may not perform well on pathologies covering only a small part of a region and still requiring supervision in the form of anatomical region bounding boxes.

    • The paper in its current form is not clinically feasible.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used 2 public datasets and the approach explained quite well which makes it feasible to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors proposed a new approach called Anatomy-driven Pathology Detection for automatic interpretation of medical scans, particularly chest X-rays, by detecting and delineating pathological regions. The authors address the scarcity of large public datasets for pathology detection by using easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies. They compare two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels. The article is well written and easy to understand, however there are typos and grammatical errors. It would be nice if the authors could fix them. The authors used 2 public datasets and the approach explained quite well which makes it feasible to reproduce. The paper seems convincing. The authors provide quite good ablation study that justifies the design of the proposed approach. The selected architectures like DenseNet is a little bit outdated and could be replaced by more recent ones (SE-ResNext, EfficientNet, etc.). Overall, the paper is good.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Technical novelty, reproducibility, and achieved results.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors addressed my concerns.



Review #3

  • Please describe the contribution of the paper

    This paper proposed an anatomy-driven pathology detection method to detect abnormalities in chest x-rays.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper utilizes pathology bounding boxes as supervision to detect abnormalities in chest x-rays. It outperforms weakly supervised method. The workload to annotate the dataset is also reduced.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of novelty. The idea of the paper is to classify abnormality in patches (meaningful pathology regions) and combine the classification results as bounding boxes.
    2. Pathology bounding boxes are usually large compared to abnormalities such as mass and nodule. Using pathology bounding boxes (or combination of them) cannot provide clinically important assistance in real practice.
    3. Different abnormality happens in different pathology regions. For example, infiltration always locates in the lower part of pulmonary. But author uses the same method to classifies abnormalities in all pathology regions.
    4. This method is not feasible to clinical practices in emergency room. This method requires pathology bounding boxes to generate bounding boxes for abnormalities, so that radiologists need to annotate pathology regions in emergency room, which is not reasonable.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author provides the code and supplementary materials.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. This work lacks novelty. The idea of this paper is to classify abnormality in patches (meaningful pathology regions) and combine the classification results as bounding boxes.
    2. The experiment is unclear to me. In table 1, how do you compare image-based methods (such as ChexNet) with your methods using IOU threshold?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Lack of novelty. The idea of this paper is to classify abnormality in patches (meaningful pathology regions) and combine the classification results as bounding boxes.
    2. Not useful in clinical practices: This method requires pathology bounding boxes to generate bounding boxes for abnormalities, so that radiologists need to annotate pathology regions in emergency room, which is not reasonable.
    3. Experiment unclear. The author used IOU threshold to compare image-based state-of-the-art methods.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes an anatomy-driven pathology detection method for automatic interpretation of medical scans, specifically focusing on chest X-rays. It addresses the issue of limited and accurate labeled datasets for pathology detection in medical imaging. The authors suggest using easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies, comparing two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels. The main strength of the paper lies in its novel idea and the evidence of success demonstrated through improved performance compared to weakly supervised methods. Additionally, the proposed approach shows potential for future research in automatic pathology detection. However, the paper has several weaknesses. It lacks clarity on how the proposed method would handle subtle pathologies or pathologies covering smaller region patches, the level of clinical expertise needed for annotation, and the clinical feasibility in practical real-world settings. The limitations of dataset size and challenges for non-chest X-ray images are not adequately discussed. Furthermore, the evaluation strategy appears contrived and there is a lack of clarity on accuracy measurement in the absence of ground truth labels. Overall, while the idea is interesting, there are significant questions that hopefully can be addressed in the rebuttal.




Author Feedback

We thank ALL reviewers for their valuable feedback and appreciate that they acknowledge the novelty of our work (R1, R2, M1), its potential for future research (R1, M1), its empirical success (R1, R2, R3, M1), and the ablation study (R2).

The reviewers (R1, R2, R3, M1) mentioned unclarities regarding handling smaller pathologies (like mass and nodule). For such pathologies covering only small parts of regions, we still predict the whole region. We consider this a limitation and already mentioned it on p.8 of the submission. Nonetheless, our method outperforms all baselines on these pathologies (see Table 1 of the supp. material). Also ¬– according to our radiologist – in clinical practice, chest X-rays are not used for the final diagnosis of such pathologies (instead, CTs or biopsies are used), and even rough localization can be beneficial. We adapted the limitation section to address this point more thoroughly.

Some reviewers (R1, R3, M1) noted unclarities in the evaluation strategy and accuracy measurement without ground truth labels. We want to clarify that the evaluation dataset does contain bounding boxes for pathologies annotated by radiologists (see p.6), which we use for evaluation. Note the difference between anatomical region boxes (present only in the training set), which define relevant regions without considering the presence or size of abnormalities, and pathology boxes (present only in the evaluation set), which explicitly describe the localization of abnormalities. We will emphasize the distinction to avoid any confusion.

There were also concerns (R3) about the IoU-based metrics and the comparability with the baselines. We emphasize that our work focuses on the localized detection of pathologies and not on image-level classification. For evaluation, we thus include the “localization accuracy” metric (also used in the original paper of our evaluation set and in baseline works like AGXNet). We also use standard object detection metrics (AP, mAP), commonly used in weakly supervised object detection. Note that classification metrics like AUROC cannot be used for evaluating localization, and some related works include it solely for evaluating classification. Some of our baselines (e.g., CheXNet) focus on (image-level) classification and do not report quantitative localization results. However, these methods propose localization approaches, which we compare quantitatively with ours. We will clarify this in the camera-ready version.

Considering clinical feasibility (R2, R3, M1), we highlight that our method does not require anatomical region boxes during inference as it predicts them based on the chest X-ray alone using the trained region detector (see Fig. 1). We, however, agree that there are some unclarities and will address them in Section 3.2. We emphasize that this is a methodological study rather than an application study such that clinical application is out-of-scope of this work. We agree with R1 and M1 that our work has huge potential for future research.

Regarding the annotation expertise (R1, M1), the physiological shape of a healthy subject’s thorax can be learned relatively easily, even by medical students without clinical experience.

We consider studying the dataset size lower bounds (R1, M1) out-of-scope but highlight that our method transfers well to other datasets (as shown by our experiments) such that it can be evaluated on smaller datasets without retraining. Regarding the possible application on non-chest X-ray images (R1), we see huge potential for modalities where abnormalities can be assigned to meaningful regions (e.g., abdominal CTs). In contrast, the application might be limited for other modalities. We will add this to the discussion section.

While we agree that abnormalities happen in different pathology regions (R3), our work focuses on a data-driven approach to finding relevant correlations automatically.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have adequately addressed the concerns raised and have provided sufficient justification, clarifications, and additional insights. They acknowledge limitations and offer explanations, particularly regarding the handling of smaller pathologies and the evaluation strategy. They also emphasize the potential for future research and the transferability of their method to other datasets. With these revisions and the inclusion of the additional insights, the paper can make a valuable contribution to automatic pathology detection in medical imaging.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After reading all the comments from reviewers and my personal reading, although the reviewers agreed that this idea might be interesting, there are still remaining questions to be addressed regarding the evaluation and experimental setting. In addition, the clinical utility should be considered even for a methodology paper. Overall, I feel the weakness of this paper is over the merits.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a pathology detection method based on anatomic annotations for automatic interpretation of chest X-rays. The idea of this paper is novel and the performance is improved compared to other methods. It shows potential for future research in automatic pathology detection. In terms of response, the author effectively answered the questions raised by reviewers and meta reviewers, including clarifying limitations, evaluation details, clinical feasibility, etc.



back to top