Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhizhong Chai, Huangjing Lin, Luyang Luo, Pheng-Ann Heng, Hao Chen

Abstract

Most of the existing object detection works are based on the bounding box annotation: each object has a precise annotated box. However, for rib fractures, the bounding box annotation is very labor-intensive and time-consuming because radiologists need to investigate and annotate the rib fractures on a slice-by-slice basis. Although a few studies have proposed weakly-supervised methods or semi-supervised methods, they could not handle different forms of supervision simultaneously. In this paper, we proposed a novel omni-supervised object detection network, which can exploit multiple different forms of annotated data to further improve the detection performance. Specifically, the proposed network contains an omni-supervised detection head, in which each form of annotation data corresponds to a unique classification branch. Furthermore, we proposed a dynamic label assignment strategy for different annotated forms of data to facilitate better learning for each branch. Moreover, we also design a confidence-aware classification loss to emphasize the samples with high confidence and further improve the model’s performance. Extensive experiments conducted on the testing dataset show our proposed method outperforms other state-of-the-art approaches consistently, demonstrating the efficacy of deep omni-supervised learning on improving rib fracture detection performance.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_23

SharedIt: https://rdcu.be/cVRs9

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed to adopt an Omni-supervised learning strategy to better leverage heterogeneous annotations when training Rib Fracture detectors. In addition to the Omni-supervised strategy, the authors further incorporate the “aggregated” confidence from different heads into the final loss calculation to strengthen the “focal” idea introduced by focal loss. Extensive experimental results confirm the effectiveness of each technical component.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall the paper is well organized making it easy to follow. Below are detailed strengths from my view: 1, Using Omni-supervised learning to handle the heterogeneous annotations. This helps to simultaneously leverage the different forms of supervision for model training is novel.

    2,The (1-W_i) multiplier introduced in the classification loss further strengthens the “dynamic focal” idea. This effectiveness is confirmed by the ablation study

    3,Extensive experiments. In table 1, the paper fairly compares other state-of-the-art semi-supervised strategies for the task and proves the effectiveness of its proposed ORF-net. Table 2 further proves that the soft confidence multiplier can help to improve the model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Though the paper is well illustrated in general, I do have the following questions:

    1, Fig 1, is a bit confusing. From my understanding, the model should have a single backbone, multi-heads structure. Namely a single image is passed through the network and different branches will work together to provide supervision based on annotations for the image. However, from Fig 1, it seems like an image triplet is sent to the network. Could you confirm which one is correct?

    2, What will the score be like during inference? Since we have three branches, how do we aggregate the confidence?

    3, Does the “CA” version of experiments in Table 2 also apply to the box regression? From the paper, it seems to claim only the classification loss adopts the confidence multiplier. If so, any explanation why?

    4,Given the 3D shape of CT scans, how are the 2D slices selected to train and test the model? Only use the key slice? Moreover, since 310 box annotated images are used for testing, does it mean all testing cases have fractures?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Positive if code and the private dataset is released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please address the questions I listed in the weakness section. In addition, I would suggest to reformat the equation 1, following the best practice of matrix , vector and scale format.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Even though I have many question regarding the training and testing process, the idea of the paper is relatively novel for the task.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The author proposed a label assigning strategy for training a rib fraction detection framework using data with three different levels of annotations, i.e., boxes, centers of objects, and none. Data with different types of annotations will be used to train their dedicated classification branch. All annotations are utilized as pixel-level supervision (in or out of the object box) during the training. Inter-guided maps (IGM) for each branch (using predictions from the other two branches) are computed and used as the GT(after thresholding) whenever annotations are not available. A private dataset of CT images is employed for the experiments and evaluation. Superior results of the proposed method are reported in comparison to previous Omni-supervised methods and other label assigning strategies, i.e., self-guided maps (SGM).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The manuscript is overall well-written and easy to follow
    • The proposed method tries to utilize all levels of annotations and actually transform them all into pixel-level supervision. The idea is valid and sound
    • Ablation study on the components in the proposed method is conducted and discussed.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I have several concerns and unclarity of some parts:

    • It is not clear how many patients are involved in the private dataset. And, is the data split patient-wise?
    • Have the authors considered or experimented with how the amount of data with different types of annotation varies?
    • SGM and IGM are not that different. And have the authors considered other maps, e.g., a uniformed W with P_bP_dP_u?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Enough information to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Pretty clear presentation of an add-on training strategy (though the innovation is limited) and grounded comparison and justification of the effectiveness of the proposed method. I only have some suggestions for further investigation, which is better to have.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents an omni-supervised learning framework to train a CT-based rib fracture detection model. The proposed framework features a shared feature pyramid network backbone and an omni-supervised detection head, which supports supervision from box-annotated, dot-annotated data, and unlabeled data. A dynamic label assignment strategy is introduced to combine multiple branch outputs and guide the model training. Experiments are conducted on a new rib fracture dataset of 2239 images. Results demonstrated the proposed method’s efficacy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper introduced an omni-supervised learning framework, which can leverage box annotation, dot annotation, and unlabeled data.
    2. This paper introduced a dynamic label assignment strategy to combine the output from multiple branches for model training.
    3. Experiments on the new dataset demonstrated the efficacy of the proposed method and the dynamic label assignment strategy.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some details of the proposed method need further clarification.
      • What is the rationale of combining output from two other branches to guide the training of one branch?
      • What does “three different annotation types of data are equally sampled” mean?
      • Are the regression results combined using non-maximum suppression?
      • Is the i in W_i, p_i, b_i the index of different pixels in a feature map?
    2. The threshold t is set to 0.5 for the dynamic label assignment and confidence-aware classification loss. How does the threshold value affect the model performance?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    With more details about the proposed method, the paper should be fairly easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Figure 1 should be improved about how the label assignment’s input and output. Current figure is a little confusing.
    2. Section 3.2, “Note that, We enable …” -> “Note that, we enable …”
    3. “As shown in Table 1, By simply …” -> “As shown in Table 1, by simply …”
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduced a new omni-supervised learning framework which is useful for training models with different types of supervisions. This can be extended to many different problems and applications. The proposed method also achieves state-of-the-art performance on a new rib fracture dataset.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    All reviewers agreed that the authors presented an interesting approach to the heterogenous annotation problem, applied to rib fractures. This is indeed a common and difficult problem to address. Moreover, experiments were judged to be quite extensive and convincing. Thus, this work merits acceptance at MICCAI.

    However, several issues were pointed out, mainly due to clarity and explanation. In particular, Figure 1 was unclear to two reviewers. Reviewers also asked some other pertinent questions as to the rationale and mechanics of the system. Please try to clarify these as much as possible.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

We thank the meta reviewer and all the reviewers for the affirmation and constructive comments. We will address all the concerns thoroughly in the final version.

Figure 1 [R1,3] We will revise and improve Figure 1 according to the comments of the reviewer in the final version. 

Datasets [R1,2]  (1)For box-labeled data, we asked the radiologists to annotate and check all slices in the CT images, and these slices were used for training and testing. For dot-labeled data, we only use the slices with labeled points for training.  (2)Data split: We collected a total of 2239 CT images from 2239 patients with rib fractures for model training and testing, where the training, validation, and testing sets were split on the patient level. The testing set contains 310 CT images with rib fractures, and all slices with and without fractures are used for testing.

Method [R1,2,3] (1)Rationale & mechanics: Inspired by the success of co-training, which adopted a mutual supervision mechanism to minimize the divergence on unlabeled data. We combine the inter-guided map generated by the outputs from the other two branches with the annotations to better supervise the current branch, prompting different branches to maximize their agreement on different kinds of annotated data for achieving better performance. (2)Confidence-aware loss: Our experimental results show that confidence-aware loss used on the classification branch can bring a large improvement. However, the proposed confidence-aware loss is not used for the box regression branch, which will be further investigated in our future work. (3)Data sampling: To ensure that each branch can be fully trained, we equally sample the data with three different annotation types in each mini batch. (4) Inference process: During the inference stage, we first compute the average score of the outputs from three classification branches, and then combine it with the regression result from the localization branch to generate the final detection result. We adopt non-maximum suppression (NMS) on the regression results gvien their classification scores, and the IoU threshold for NMS is set as 0.6 in all experiments.



back to top