List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Kunlei Hong, Lin Guo, Yuan-ming Fleming Lure
Abstract
Tuberculosis (TB) is the second leading cause of infectious disease death, and chest X-ray is one of the most commonly used methods to detect TB. In this work, we bring forward a Self-Rating Curriculum Learning (SRCL) method to exploit the task of localization and segmentation of tuberculosis on chest radiographs. A total number of 12,000 CXR images of healthy sub-jects and bacteriologically-confirmed TB patients, retrospectively collected from multi-center local hospitals, are used in the study. A classical instance localization and segmentation framework (Mask-RCNN with backbone Resnet-50) is presented to compare traditional one-step training method and our proposed SRCL method in metrics and efficiency. First, a teacher model with self-rating function without human participation is developed to output the rating score of each sample, and all the samples are classified into three categories, namely easy set, moderate set and hard set, by using kernel density estimate (KDE) plot. After grouping the cases images in order of difficulty, the SRCL training is conducted on progressively harder images in three stages. We evaluate the proposed SRCL method in metrics and efficiency. Results indicate that the proposed SRCL method is able to boost the performance of the compared traditional method.
Link to paper
DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_65
SharedIt: https://rdcu.be/cVD7l
Link to the code repository
N/A
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
This paper presents a self-rating curriculum learning (SRCL) method for localization and segmentation of Tuberculosis on chest X-ray images. Experiments were conducted to compare the performance of the proposed method with that of the teacher model, Resnet50-FPN with Mask R-CNN, and the experimental results show the proposed method outperforms the teacher model.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper proposed an effective ranking function using self-ranking scores instead of using prior knowledge of human experts. The experimental results show the SRCL has improved mAP although AUC has not been improved.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The proposed ranking function has some novel aspect, however, the idea of SRCL is not new. The self-ranking curriculum learning or self paced curriculum learning or automated curriculum learning has been proposed and reported in papers in the past.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The dataset download link is not provided. The training code and model are not provided.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
This paper presented an interesting research topic and proposed an effective ranking function for scoring images. The experimental design and evaluation of the data are satisfactory and the conclusions are justified. The manuscript is written in clear and concise English. However, the paper can be further improved by (1) referencing the latest state-of-the-art curriculum learning papers published in the last three years and compare your proposed method with the state-of-the-art methods; (2) using multiple datasets to validate your proposed method; (3) using more evaluation metrics such as sensitivity and specificity; and (4) rectifying some typos such as “6000 case” should be “6,000 cases” (Page 3), “9600 samples” should be “9,600 samples” (Page 5), “achieve 4943” should be “achieve “4,943” (Page 7).
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed ranking function design may be of interests of broad audiences, and may inspire other researchers in designing their ranking functions when using curriculum learning methods.
- Number of papers in your stack
4
- What is the ranking of this paper in your review stack?
3
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Not Answered
- [Post rebuttal] Please justify your decision
Not Answered
Review #2
- Please describe the contribution of the paper
Proposed an automatic method to rank image difficulty in order to perform curriculum learning by gradually adding more difficult images into the training set;
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The proposed difficulty ranking algorithm was fully automatic, therefore making it more efficient to perform curriculum learning without human intervention.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1) The problem being solved (TB segmentation/classification) was not very challenging given very high and almost identical AUCs using the proposed method and the teacher network (see Table 2); 2) There was not a lot of novelty in the deep learning network. Mask-RCNN with ResNet was used.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
It should be reproducible.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1) In the paper, AP50/mAP50 was mentioned several times but I couldn’t find their definitions. Please add one or two sentences of definition in the text when it was first mentioned. 2) Page 4 design of the ranking function: this was probably the most important part of the paper. However, there was no justification on why it was designed this way. Please at least provide high-level intuitive justification as to why this would be the optimal design. 3) In Table 2, the results using SRCL was only slightly better than straight-forward training in the AP metrics but not in AUCs, which made me wonder how much the proposed SRCL really helped improving the results. One additional experiment that could be helpful would be to compare the SRCL in the paper to a different self-ranking algorithm (for example a very naive algorithm that only looked at classifier output probability and its difference to ground truth labels). If it could be shown that the proposed self-ranking algorithm was superior to a naive self-ranking algorithm, I think the results would be a little stronger.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Even though the proposed self-rating of image difficulty for curriculum learning seemed useful and somewhat novel, I wasn’t convinced that it couldn’t be replaced by a straight-forward method such as comparing how much ground truth differed from classifier output probability.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
3
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Not Answered
- [Post rebuttal] Please justify your decision
Not Answered
Review #3
- Please describe the contribution of the paper
The study proposes a model training approach called self-rating curriculum learning. The idea of curriculum learning is starting the training process with relatively easier to predict dataset and gradually increase the difficulty level of the data. According to authors, one challenge for this approach is building difficulty measurer, which includes human expertise prior knowledge and effort. The study proposes a self-rating approach for difficulty measurer part, which does not require human participation. A teacher model is trained to classify the data into categories. The data is then gradually used to train to localize and segment the TB affected areas on CXRs. The authors also curated a large number of TB patient data from multi-center hospitals to develop and test the model. It is not clear that authors will share the dataset.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The idea is simple and applicable to all types of medical images and most of the medical image analysis tasks.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The paper could have been written in a better way with better formulations, more organized way of parameter listing, and comparison outcomes. I think the method is not compared to “without SRLC approach”. What would be the results of the Mask-RCNN+Resnet50 backbone architecture trained with the same training CXR dataset without SRLC approach, and tested on the same test set. Then, we would have a better understanding how much SRLC have contributed to the learning process. Is these results somewhere in the text (paper needs a better organization – especially experimental section – while providing the outcome of the test results).
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Although the authors curated a large size TB-CXR dataset, my understanding is that they are not sharing the data with the manuscript. The authors also did not share codes. The experimental parameters are provided in the text.
The idea is applicable to most medical image analysis problems.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The authors mentioned the TB studies in the literature, and one is the lack of radiologists’ annotation. The mentioned study – Chexpert – is indeed one of the largest CXR datasets, and annotations are automatically extracted from radiology reports using an NLP approach – which contains mistakes. Then, a group of radiologists at RSNA went over some portion of the dataset for manual checking and used this subset as ground truth. But, as far as I remember, this large CXR dataset does not contain TB patient data.
I would suggest authors go over the manuscript to provide a better-organized manuscript.
I think the method is not compared to the “without SRLC approach”. What would be the results of the Mask-RCNN+Resnet50 backbone architecture trained with the same training CXR dataset without the SRLC approach and tested on the same test set? Then, we would have a better understanding of how much SRLC has contributed to the learning process.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Nice and simple solution to increase the model training performance, which can be applicable most of the medical image analysis problems. However, the proposed approach did not provide a comparison with the base solution.
- Number of papers in your stack
5
- What is the ranking of this paper in your review stack?
2
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Not Answered
- [Post rebuttal] Please justify your decision
Not Answered
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Reviewers recognized the value of the research topic, and the design of the proposed ranking function and genaral purpose pipeline. In the rebuttal, more details will be needed to justify the motivation/intuition of the method design, and the performance gain of different components so as to clarify the contribution from each of them.
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
4
Author Feedback
We thank reviewers for positive comments and suggestions. First, we’d like to stress that our contributions are how to build self-rating ranking (SRR) function without human expertise prior knowledge for curriculum learning (CL), highly recognized as effective (R1,2,3) and novel (R1,2). Reviewers pointed out that our design may be of interests of broad audience (R1) and inspire many in the field of CL (R1) in all types of medical images (R3) with good clarity and organization (R1,2). We address major concerns below.
Q1: Self-ranking CL (SRCL) and self-paced CL (SPCL) or automated CL (ACL) has been proposed in the past (R1). There was not enough novelty in the network (R2) and need more justification on ranking function design motivation (R2, meta-reviewer). A1: ACL has been reported before and both our SRCL and SPCL belong to ACL, but with different difficulty measure process. The strength of our SRCL is well received by R1 and R3, as it establishes SRR function for teacher model to divide training dataset into different hardness levels before student training, which is different from SPCL letting student act as teacher and the difficulty of training examples be decided based on its losses. Instead, the main motivation of our ranking function is to score training images by difficulty with reduced subjective biases and time costs versus traditional CL methods with human participation and to combine both missed diagnosis and misdiagnose synthetically in each sample (Sec. 2.4). Since deep learning innovation is not our goal, therefore Mask-RCNN+Resnet 50 backbone architecture (MRR50) is used in the same training and validation set to investigate SRCL contribution to the learning process.
Q2: Further improvement by citing state-of-the-art CL paper in the last 3 years (R1), additional experiments by comparing MRR50 trained with the same training dataset without SRCL and tested on the same test set (R3) and with a naive self-ranking algorithm (R2). A3: We do cite [11,12,13] from 2021, and there may be some misunderstanding on comparison experiment. R2 is correct that we conduct comparison experiment between SRCL and the baseline solution (i.e., teacher model without SRR function). All the training, validation and test sets are the same, and the only difference is that training set is fed for one-step training process in the baseline, whereas same training set is used by gradually adding more difficult images into training process in SRCL (Sec. 2.4). As emphasized in the motivation (referring A1) agreed by R1 & R3, comparison between CL with/wo SRR function on the MRR50 with the same dataset is sufficient to evaluate SRLC contribution to the learning process. The investigation on comparison between different algorithms are planned in other paper.
Q3: Sensitivity (SS), specificity (SP) (R1), and performance gain of different components of model (Meta-reviewer) may be needed for further improvement, and AP metrics (need to add definition) showed better performance but AUCs did not show much (R2). A3: Our proposed SRCL is used to localize and segment target (i.e., tuberculosis), hence mAP (mean average precision), AP50 (average precision at IoU threshold=0.5) and AUC are used as metrics. mAP and AP50 are commonly used to compare location accuracy, whereas AUC is used to compare classification accuracy at different SS and SP. Our result shows that the classification accuracy of both SRCL and baseline solution achieve at the same level of high AUC, which may be due to a well-designed architecture (e.g., residual block) and use of multi-center dataset with good generalizability. At the same high classification accuracy, the location accuracy of SRCL outperforms baseline solution by about 3%, recognized as a substantial improvement in CL field, and may inspire others to develop their own self-ranking on their dataset and image analysis task (R1).
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Rebuttal added clarifications on the contribution of this work.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
9
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This a real gray zone submission. I do have one spicifc objection: removing context and human expertise from CL in healthcare is not a positive development IMO. If anything, I hope we move towards making knowledge an integral part of all AI models we build in MICCAI community.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Reject
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
11
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Both reviewer and AC recommendations on this paper were split with a large divergence. The PCs thus assessed the paper reviews, meta-reviews, the rebuttal, and the submission. It is noted that the reviewers appreciated the value of the research topic and the design of the proposed method. The primary AC felt that the authors successfully clarified the contribution of the work during rebuttal and recommended acceptance. Overall, while there are areas for improvement, the PCs agreed that the weaknesses were outweighed by the strengths, and the final decision of the paper is thus accept.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
NR