Authors

Jinghan Sun, Dong Wei, Zhe Xu, Donghuan Lu, Hong Liu, Liansheng Wang, Yefeng Zheng

Abstract

Chest X-ray (CXR) anatomical abnormality detection aims at localizing and characterising cardiopulmonary radiological findings in the radiographs, which can expedite clinical workflow and reduce observational oversights. Most existing methods attempted this task in either fully supervised settings which demanded costly mass per-abnormality annotations, or weakly supervised settings which still lagged badly behind fully supervised methods in performance. In this work, we propose a co-evolutionary image and report distillation (CEIRD) framework, which approaches semi-supervised abnormality detection in CXR by grounding the visual detection results with text-classified abnormalities from paired radiology reports, and vice versa. Concretely, based on the classical teacher-student pseudo label distillation (TSD) paradigm, we additionally introduce an auxiliary report classification model, whose prediction is used for report-guided pseudo detection label refinement (RPDLR) in the primary vision detection task. Inversely, we also use the prediction of the vision detection model for abnormality-guided pseudo classification label refinement (APCLR) in the auxiliary report classification task, and propose a co-evolution strategy where the vision and report models mutually promote each other with RPDLR and APCLR performed alternatively. To this end, we effectively incorporate the weak supervision by reports into the semi-supervised TSD pipeline. Besides the cross-modal pseudo label refinement, we further propose an intraimage-modal self-adaptive non-maximum suppression, where the pseudo detection labels generated by the teacher vision model are dynamically rectified by high-confidence predictions by the student. Experimental results on the public MIMIC-CXR benchmark demonstrate CEIRD’s superior performance to several up-to-date weakly and semi-supervised methods. Our code will be available.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_35

SharedIt: https://rdcu.be/dnwcM

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #2

Please describe the contribution of the paper

Authors propose a joint training scheme for chest x-ray abnormality detection task in a semi or weakly-supervised setting. They jointly train a text classification model on radiology reports and image detection model on x-ray images. The predictions of text modality are used in a student-teacher scheme to improve the predictions of the image model, and vice versa. The sequential joint training of models in two different modalities helps alleviate the noise inherent in regular student-teacher training based on a single modality.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The strongest aspect of the paper is a simple, but evidently impactful, idea. Authors’ main proposed modification to the regular student-teacher setup is given in Eq. 3. Briefly, during the student’s training (RetinaNet), they create pseudo-labels through a teacher model as usual but only keep the pseudo-labels that are also included in the radiology report. In order to check whether a teacher-generated pseudo-label is present in the radiology report, authors use a text classification model (BERT), trained separately in a semi-supervised way on the text reports.

Authors also suggest iteratively improving the text and image models. Under this setup, they first train the student image model after sifting the teacher image model generated pseudo-labels through the student text model. Then, they fine-tune the student text model on labels generated by the teacher text model but sifted by the student image model. They repeat this process twice.

Authors report improved performance on MIMIC-CXR dataset, supported by an adequate ablation study.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The two main weaknesses of the paper are using a single dataset to demonstrate the results and reporting a single run (or if it is multiple runs, not reporting the variance).

Another aspect regarding the reporting I find lacking is the baselines. In addition to the reported benchmark techniques, I would have also liked to see a comparison against a simpler baseline, e.g. a detection model with an additional head that predicts the report labels (referred as “oracle”) in the paper.

Finally, I find the focus on the iterative training, i.e. co-evolution, to be misleading. As reported in the ablations, co-evolution has a much smaller impact than the non-maxima suppression. Consequently, I believe the authors do not discuss self-adaptive non-maxima suppression adequately.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Despite its conceptual simplicity, the proposed training method has many parts working together. Authors refer to another paper for their hyperparameter selection since they have used the default settings there. The authors also state that the code will be released but it is difficult to assess the reproducibility currently.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

It is unclear why the ablation results of (a)+(b)+(c) in Table 2 are different from the main results in Table 1.

Typos, in Figure 1; “cadiomegaly” -> “cardiomegaly”, “edama” -> “edema”
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the limited scope of the evaluation I believe the core idea of the paper has merit. Utilizing the information present in text form, e.g. radiology reports or other side information, to boost the performance of image models is not a new idea. However, to the best of my knowledge, the authors propose a novel method for utilizing this information and showcase their method’s applicability. The evaluation protocol is lacking but just the simplicity and the applicability of the core idea makes me believe the community will benefit from this work.
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper presents a new approach to semi-supervised abnormality detection in CXR where report classification of abnormality is guided by visual detection and vice versa, achieved via an interactive co-evolution training scheme. A NMA module is further used to improve the quality of the pseudo labels. Experiments are conducted on MIMIC-CXR dataset, in comparison to a variety of weakly supervised, fully-supervised, and semi-supervised baselines. Ablation studies are also presented.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The idea of co-evolution by using report classification to guide visual detection and vise versa is novel.
- The paper is overall well written, and methodology and experiments well described.
- Experimental results demonstrate the improvements achieved by the presented method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The results can be improved by adding statistics from different random seeds.
- Does the iterative co-evolution training increase the computational cost of the training? Some comparison on training cost and parameter count would be interesting.
- Is it true that the performance gain was mainly obtained in the first iteration (as suggested by Fig 3). Does this suggest that perhaps no iterative evolution is necessary? Please comment.
- Do the semi-supervised baselines include methods that leverage report information? If not, it may not be clear whether the benefits come from the proposed co-evolution strategy, or from merging additional information from reports.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors will share their code and implementation to the research community which will help ensure the reproducibility of the work.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The paper can be improved by addressing the three comments raised in the weakness section. A more in-depth understanding of how the model performance changes during the evolution, and whether in general iterative procedure is needed (or just merging the information from report would achieve the gain as reported).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The presented methodology has novelty and the experiments are overall reasonably designed. The comparison of performance among weakly-supervised, fully-supervised, and semi-supervised models provide interesting insights, and the ablation study as well as the results of vision and report models outlined in Fig 3 are appreciated. The paper can be improved by addressing questions related to statistics of the results, computational cost of the method, and whether the authors believe the gain of performance mainly arise from the iterative co-evolution or the static fusion of report information into CXR abnormality detection.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This work introduces a novel approach to detecting anatomical anomalies in chest X-rays (CXRs) through the use of a semi-supervised learning method that incorporates both image and report data. The main innovations of this study are: (1) the elimination of pseudo-label noise through the implementation of self-adaptive NMS and report-guided pseudo-label refinement, (2) the co-evolutionary updating of both image and text classification models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The task the authors discussed in this work is a popular and crucial problem in the field. And the proposed solution has the potential to impact and enhance other chest X-ray CAD solutions;
2. The proposed approach is noteworthy for its innovative design that effectively leverages partial labeled image and text report data;
3. The results of the quantitative evaluations demonstrate a noticeable improvement in performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The motivation of some technical designs are not well-explained or unclear;
2. Some details regarding the training of the cycle pseudo label refinement (e.g., the refinement frequency and how it will impact the final results) are missing;
3. The baselines compared in the experiments might be too weak (e.g., CAM). Most recent weakly-supervised detection methods should be compared;
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors will release the code and model upon acceptance. The training configurations discussed in the paper should be good for the reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. The authors should clarify the motivation of keeping the teacher model fixed. In teacher-student model paradigm, the teacher model is usually updated by the Exponential Moving Average (EMA) of the weights of the student model. By allowing the EMA updating, your teacher model can be seen as a temporal ensemble of the student model, which usually yields more robust pseudo-label generation. In this way, the self-adaptive NMS might not be necessary. In a word, the review is doubt why do the authors keep the teacher’s weights fixed in this work? The authors should discuss the motivation in the paper;
2. From the reviewer’s perspective, the frequency of the co-evolution refinement of the text and image model should be carefully designed. Too frequent might be noisy and eventually impede the model training. Therefore, the authors should discuss how to determine the optimal refinement frequency;
3. The baselines (e.g., CAM) may not be strong enough for comparison. One or more of the most recent weakly-supervised detection methods should be compared;
Suggestions:
1. The authors should refine the layout of Table 2 and Fig. 3. Now the figure is too small to read. And the table is a little bit confused at the first glance;
2. As mentioned in the experiments, this work only address frontal view CXR diagnoses. A more comprehensive solution should include multi-view feature fusion and reasoning to capture more complicated scenarios;
Question:
1. As radiologist report sometimes may indicate uncertainties in the imaging findings (e.g., cannot rule out whatever finding). The reviewer curious how do the authors tackle this kind of corner cases?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method is novel and interesting. The experimental results demonstrated the effectiveness of the proposed method. However, some technical details are missing in the current version, and the baselines in the comparison study are relatively weak and old. As a result, I vote for a weak accept for the initial round of review.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a novel approach to semi-supervised abnormality detection in chest X-rays (CXRs) by incorporating both visual detection and report classification in an interactive co-evolution training scheme. The reviewers highlight several strengths of the paper, including the novelty of the co-evolutionary approach, the well-written methodology and experimental descriptions, and the demonstrated improvements achieved by the proposed method. However, weaknesses pointed out by the reviewers include the lack of statistical analysis using different random seeds, the absence of a comparison of computational costs, the uncertainty regarding the necessity of iterative evolution, and the need for clearer baselines that leverage report information. Despite these weaknesses, the reviewers acknowledge the paper’s potential. It is recommended to take the reviews into account, address the raised questions, and clarify certain aspects in the camera-ready.

Author Feedback

N/A

back to top

You’ve Got Two Teachers: Co-evolutionary Image and Report Distillation for Semi-supervised Anatomical Abnormality Detection in Chest X-ray