Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Haoqin Ji, Haozhe Liu, Yuexiang Li, Jinheng Xie, Nanjun He, Yawen Huang, Dong Wei, Xinrong Chen, Linlin Shen, Yefeng Zheng

Abstract

Accurate abnormality localization in chest X-rays (CXR) can benefit the clinical diagnosis of various thoracic diseases. However, the lesion-level annotation can only be performed by experienced radiologists, and it is tedious and time-consuming, thus difficult to acquire. Such a situation results in a difficulty to develop a fully-supervised abnormality localization system for chest X-rays. In this regard, we propose to train the CXR abnormality localization framework via a weakly semi-supervised strategy, termed Point Beyond Class (PBC), which utilizes a small number of fully annotated CXRs with lesion-level bounding boxes and extensive weakly annotated samples by points. Such a point annotation setting can provide weakly instance-level information for abnormality localization with a marginal annotation cost. Particularly, the core idea behind our PCB is to learn a robust and accurate mapping from the point annotations to the bounding boxes against the variance of annotated points. To achieve that, a regularization term, namely multi-point consistency, is proposed, which drives the model to generate the consistent bounding box from different point annotations inside the same abnormality. Furthermore, a self-supervision, termed symmetric consistency, is also proposed to deeply exploit the useful information from the weakly annotated data for abnormality localization. Experimental results on RSNA and VinDr-CXR datasets justify the effectiveness of the proposed method. An improvement of ~5% in mAP can be achieved by our PBC, compared to the current state-of-the-art method (i.e. Point DETR).

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_24

SharedIt: https://rdcu.be/cVRta

Link to the code repository

https://github.com/HaozheLiu-ST/Point-Beyond-Class

Link to the dataset(s)

https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/overview

https://vindr.ai/datasets/cxr


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper follows the framework of Point DETR and tries to introduce it into the field of abnormality localization with chest X-rays. In my opinion, the main contribution of the paper lies in the proposal of the two regularization terms (multi-point consistency and symmetric consistency) with the Point DETR framework, which seem novel and are proved to be efficient.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    While the work focuses on applying Point DETR to abnormality localization with chest X-rays, the two regularization terms (multi-point consistency and symmetric consistency) proposed to improve the original framework seems interesting and novel. The multi-point consistency cleverly utilize the strongly labeled data to generate more point-labeled data and drives the model to generate the consistent bounding box from different point annotations inside the same abnormality. The symmetric consistency adopts a self-supervision scheme and drives the model to generate consistent predictions under different transformations (flipping and masking).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty is relatively limited. The work seems an application of the Point DETR framework into the field of Chest X-Rays. Although the two regularization terms seems novel and interesting, the work is an incremental improvement of the Point DETR framework since there may be numerous such kind of improvement based on the framework.
    2. Some details may need to be made clearer: (1) In the description of Step 3, it is said that “After the above two steps, we get a well-trained Point DETR (Fd(·, ·)), which is regarded as the teacher model to generate pseudo box labels for point-level weakly-annotated data.” However, I think the training stage of Point DETR is still needed to train the model to predict box labels from point labels. Please make clear. (2) It is said that for the student detector, two models (FCOS and Faster R-CNN) are adopted. So I wonder which model is used for the systems in Table 1, and why the results in Table 1 do not match with the results in Table 2.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author did not mention the code in the paper, but according to the checklist, it seems they will release the code after acceptation. Please make clear. I suggest that at least release the splitting of the training set and the test set, which may help in pursuing the goal of setting a benchmark for the field.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I think the symmetric consistency has less to do with the point label and may be also used with full-labeled data.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The two regularization terms (multi-point consistency and symmetric consistency) proposed to improve the original framework seems interesting and novel, and experimental results proved the effectiveness. However, it seems an incremental improvement on an existing model, and some details need to be made clear.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    In this paper, the author improves the existing abnormality localization pipeline with self-supervised learning. More specifically, they emphasize the importance of multi-point consistency and symmetric consistency to improve the model robustness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The presentation is clear and easy to follow.
    2. The proposed solution is driven by the medical formulation, and the self-supervised learning loss make great sense.
    3. The authors have done detailed analysis on a dataset by changing the backbones and the children model.
    4. The released benchmark is likely applicable to other researchers in the same field.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. I am not fully convinced by the effectiveness of the proposed model on a single dataset. If possible, enhancing the current results by another dataset will be extremely helpful.
    2. In Figure 2, it will be better if you could point out the key differences from the existing DETR baseline.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the dataset is generally available, I do not see the code for replicting it. The result may be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Overall, I believe this is a good paper with clear motivation and reasonable design. It may be stronger if:

    1. the result is effective on at least another dataset and the code is released
    2. Improve the caption of Fig 2 for better presentation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I believe this is a good paper with clear motivation and reasonable design. It would help some of the researchers in this field with a new benchmark and a way for self-supervised learning.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    Abnormality detection in chest X-rays (CXR) is notoriously hard as the boundaries of lesions in CXR images are not clear. Training a model with full supervision i.e. pixel annotation is the gold standard, but producing these masks is really time expensive. On the other hand, models trained using only image annotation perform poorly. A middle ground can be reached at limited time-cost by providing point based annotation for lesions. The authors propose a new regularization-based method to improve the results using the latter annotation and propose ablation studies to evaluate the impact of their regularization terms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method. Point DETR is a promising method for weakly-supervised object detection (WSSOD) and to the best of my knowledge the authors propose the application of this method to CXR images. ​ The regularization. Most importantly, the authors propose some novelty in the form of stronger regularization for Point DETR. This regularization makes sense for the chosen application and improve the mean Average Precision. ​ The experiments. The method’s evaluation is quite thorough as an ablation study is first performed before comparing their new method to the existing ones for WSSOD of CXR. This evaluation is notably performed on two publicly available datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My main critique is that while the overall paper is quite clear regarding the technical aspects, some of the authors claim need to be further refined or additional information needs to be stated:

    • The starting point of the paper is Bearman et al.’s analysis, which was done on the PASCAL VOC and not CXR images. This should be stated.
    • In the abstract, an improvement of ~5% mAP is claimed. This improvement should be nuanced, esp. with regard to which detector arch. / %age of labeled data was used, as the improvement can vary greatly.
    • The authors’ claim to have produced a “publicly available benchmark”, while there is no mention that the code used will be made available. Only the training data is currently available. ​
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the code is not made public, the data used is publicly available. The author’s new method is also clearly explained and the parameters for each experiment / augmentation are given. So while these exact results are not reproducible based the information present in the paper, I would expect that similar results could be reproduced when implementing from scratch.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Suggestions to answer my concerns:

    • Clearly state that Bearman ar al.’s work was done on the PASCAL VOC dataset and that the data was not CXR images.
    • Slightly nuance the mAP improvement claim in the abstract e.g. give the average improvement for each detector arch. or when using a certain %age of labeled data.
    • Regarding the claim of having produced a “publicly available benchmark”, either state that the code used to produce the results will be made public or change the claim to having produced a “new method”. ​ Only a couple typos, good job:
    • 4 Experiments, subsection “Dataset”: this subsection should rather be called “Datasets”
    • Caption Table 2: WSOD → WSSOD
    • Overall “chest x-rays” appears often even after the acronym “CXR” has been defined. You can probably replace a few occurrences ​
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors claim to have produced a “publicly available benchmark” without publishing any code, which does not fit my description of such a benchmark. If this is because of the double-blinded review process, they should clearly state that. The authors should either publish this code or change their claim to “a new method”.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    My concerns are answered quite well. Most issues raised by other reviewers regarding clarity will also be fixed. So, I believe that with the release of the code, this paper will provide a useful benchmark to the MICCAI community.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper extended the Point DETR method to the disease localization in the application of chest X-ray images. Two consistency regularization strategies have been proposed and ablation studies demonstrated better performance. However, there are several major concerns from reviewers, including incremental contribution compared to original Point DETR framework, unclear description of implementation, reproducibility possibility, etc. The authors are invited to address these issues in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

[Overview] We sincerely thank all reviewers for their comments. We are pleased to see that reviewers agree with the novelty (R1, R2 and R3), the potential for a new research line (R1, R2 and R3), and the convincing experimental results (R1 and R3) of our paper. Even so, some criticisms are raised. We rephrase the criticisms and provide the responses to the major issues as below.

  1. Code release. [R1, R2, R3] A: To ensure the reproducibility, we will officially release the code, trained models and log files, once the paper is accepted. The information is confirmed in the checklist of our submission.

  2. Contribution over Point DETR. [R1, R3] A: The contribution are summarized into two-fold: i) Point DETR achieves the competitive performance for natural images; however, it fails to deal with CXRs due to the unclear boundary of lesions and the limited amount of data. To this end, this paper proposes two simple-yet-effective regularization terms to address the challenges by further exploiting useful information from both box and point-level label. Furthermore, the training strategy for weakly semi-supervised object detection (WSSOD) proposed in our study is easy-to-implement to any other network architectures, e.g., FCOS / Faster R-CNN with a point encoder. The reason adopting Point DETR as the baseline is that it achieves better performance than others in the experiments. ii) To our best knowledge, this is the first work establishing a WSSOD benchmark for the CXR abnormality localization with publicly-available datasets and codes. Compared to Point DETR, we conduct a more comprehensive benchmark, including semi-supervised, weakly-supervised and fully supervised methods. This publicly-available protocol provides an unbiased evaluation for the future methods, and encourages more researchers to make their efforts along this new research line.

  3. Training details [R1] A: In our study, Point DETR is trained using both box and point-level annotations. In the first stage, points inside the manually-labeled boxes are sampled as input to Point DETR. In this stage, the box-level annotations are supervision signal. Then, the Point DETR works as the teacher, which yields the pseudo box-level annotations for the student model with point-level labels. In this training stage, Point DETR is supervised by the proposed regularization terms with point-level labels to yield the more accurate pseudo bounding boxes.

  4. Experimental results in Table I and II. [R1] A: Table I reports the results of the teacher model (Point DETR), while Table II shows the results of the student models (FCOS and Faster R-CNN).

  5. Symmetric consistency for box-level annotation [R1] A: We adopt data augmentation (e.g., flipping) for box-level annotation, which draws similar guidance to symmetric consistency. To alleviate extra-but-unnecessary computational cost, we do not use symmetric consistency in step 1.

  6. More datasets [R2] A: We respectfully emphasize that our method is evaluated on two CXR datasets (RSNA and VinDr-CXR) in Table II. Nevertheless, we appreciate the valuable comment. An experiment is carried on CVC dataset. Our PBC achieves an improvement of 7.3% (AP-mask) over Point DETR with 20% fully-labeled data. The results will be given in the final version.

  7. Improving Fig.2 [R2] A: Thanks for the suggestion. We will illustrate the difference between the baseline and our method in Fig.2.

  8. Clear statement for Bearman’s work and the improvement in abstract. [R3] A: We will clearly state that the results in Fig. 1 are based on PASCAL VOC. Meanwhile, we will add the details (e.g., architecture, dataset and ratio for training) for the improvement in abstract.

  9. Revising typos [R3] A: We will carefully proofread the paper. Please note that ‘WSOD’ in Table II is not a typo, since PCL is a weakly supervised object detection method.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Based on the consistency of three reviewers, the authors address the major concerns. My personal reading also gives positive vote for acceptance based on the methodological extension to the chest x-ray application and improved performance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors addressed the key concerns

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper studies abnormality detection in chest X-rays (CXR) images by proposing a regularization-based method to emphasize the importance of multi-point consistency and symmetric consistency to improve the model robustness. The presented method. The idea appears interesting and rebuttals have addressed most concerns raised by the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



back to top