Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Shafa Balaram, Cuong M. Nguyen, Ashraf Kassim, Pavitra Krishnaswamy

Abstract

Deep learning approaches achieve state-of-the-art performance for classifying radiology images, but rely on large labelled datasets that require resource-intensive annotation by specialists. Both semi-supervised learning and active learning can be utilised to mitigate this annotation burden. However, there is limited work on combining the advantages of semi-supervised and active learning approaches for multi-label medical image classification. Here, we introduce a novel Consistency-based Semi-supervised Evidential Active Learning framework (CSEAL). Specifically, we leverage predictive uncertainty based on theories of evidence and subjective logic to develop an end-to-end integrated approach that combines consistency-based semi-supervised learning with uncertainty-based active learning. We apply our approach to enhance four leading consistency-based semi-supervised learning methods: Pseudo-labelling, Virtual Adversarial Training, Mean Teacher and NoTeacher. Extensive evaluations on multi-label Chest X-Ray classification tasks demonstrate that CSEAL achieves substantive performance improvements over two leading semi-supervised active learning baselines. Further, a class-wise breakdown of results shows that our approach can substantially improve accuracy on rarer abnormalities with fewer labelled samples.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_64

SharedIt: https://rdcu.be/cVD7k

Link to the code repository

N/A

Link to the dataset(s)

https://nihcc.app.box.com/v/ChestXray-NIHCC


Reviews

Review #1

  • Please describe the contribution of the paper

    A Consistency-based Semi-supervised Evidential Active Learning framework (CSEAL) is presented in this submission to mitigate the data annotation problem in supervised learning for radiological image classification, where consistency-based semi-supervised learning is combined with uncertainty-based active learning. The proposed CSEAL is applied to enhance four types of consistency-based semi-supervised learning methods. Evaluation on NIH-14 Chest X-Ray dataset is performed to demonstrate the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A Consistency-based Semi-supervised Evidential Active Learning framework (CSEAL) is presented, where two major components are involved: evidential-based semi-supervised learning, and evidential-based active learning.

    • In the low-range labelling regime of the average test AUROC, the best performing CSEAL method eNoT+AU outperforms two baselines ToD+CoD [10] and VAT+Aug Var [7].

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The discussion of the results is not enough to help to understand the results. In particular, Fig. 2 is relatively easy to understand without explanation, but Fig. 3 definitely needs additional discussion.

    • The presentation and organization of this submission needs to be improved. It is recommended to move the figure close to the text where the figure is referred or discussed (e.g., Fig. 1 is at page 5, but the corresponding text is at page 3).

    • The limitations of the proposed approach and the directions for future research are not discussed in this submission.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is not submitted for review.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This submission can be improved in the following places:

    • Revise and add discussions for Fig. 3 to provide a clear understanding of the results.

    • Fix the figure location issue by moving Fig. 1 to be closer with the text referred to it.

    • Add discussion of the limitations of the proposed approach and the directions for future research.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The rating is based on the fact that a new active learning framework is presented in this paper for radiological image classification, which achieves comparable (or better) performance to the state of the art.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a method to leverage both semi-supervised and active learning for the task of X-ray classification. The proposed method, called CSEAL, can be applied to any consistency based semi-supervised learning method – the paper applies it to Pseudo-labelling, Virtual Adversarial Training, Mean Teacher and NoTeacher. Experiments on chest X-ray classification show that combining with active learning does improve performance compared to other semi-supervised + active learning methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors propose a novel combination of SSL + AL that can be applied to a variety of SSL methods.
    • The new method obtains better performance than strong baselines like COD + TOD and VAT + AugVar.
    • The improvements are especially good for rare classes which are harder.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Parts of the paper could be re-written for better clarity. The paper uses too many abbreviations which makes it hard to read.
    • Limited experiments: The method is only tested on a single X-ray dataset with a single model (Densenet-121 backbone). This makes it difficult to tell if the results generalize. Further, it is unclear if the same method (eNoT+AU) is superior on every dataset.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Hyperparameters are specified in the paper, and the model has been clearly described. The dataset used is publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Improving the readability of the paper: please expand some of the abbreviations or use descriptive short names.
    • Adding more experiments showing that these methods are broadly applicable is helpful.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel, and the authors obtain good results on an X-ray dataset. However, with limited experiments, it is difficult to justify a higher rating.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors proposed a new method (CDSEAL) to enhance existing consistency-based semi-supervised learning methods. They compare it with existing semi-supervised active learning algorithm and show that their method improve over the baseline. On top of the semi-supervised learning, authors introduced an uncertainty estimation for active learning. The results show that it improved over random sampling.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper improves the classification performance in the existence of low number of annotated data and abundant unannotated data.
    • The application is indeed appealing in the biomedical domain where the expert annotations are rare and expensive.
    • The way the CDSEAL semi-supervised learning is formulated. In addition, authors have a good grasp of the current state of the art.
    • The paper is well structure and the stream of thought is clear.
    • The selected low-range labeling regime is realistic and well done.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While known semi-supervised algorithms are used to implement CDSEAL, they are not benchmarked themselves. For example, ePSU is benchmarked on the dataset. However, PSU (pseudo-label) itself is not benchmarked. Therefore, the question arises that the observed improvement over the supervised learning comes from the PSU or the CDSEAL loss.

    The authors also used AUCROC for the evaluation of the methods. While AUCROC is a well-known metric, it is flawed and skewed for highly imbalanced datasets. It is suggested to use the area under the precision-recall curve instead.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Since code is not released, it is not clear to which extend one can reproduce the work from the paper itself.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    While providing a new framework, there is always a question that how the new framework perform on different datasets/modalities. The framework is tested on NIH-14 Chest X-Ray data, which is a challenging dataset. However, it is not clear how it would work on new datasets.

    Regarding the active learning part, the authors only provided AU for the uncertainty sampling. They have shown that the AU works better than random sampling. However, it is not compared to other active learning algorithms. Therefore, the question comes up that whether AU is the best method for active learning.

    The figures were all readable. However, the abbreviations made them complicated to follow. For example, for the Fig.2 and Fig.3 (which include the main message of the paper), one has to remember a series of not very well-known abbreviations to be able to understand the plots.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is quite interesting and seems to be useful. Although there are some shortcomings in reporting the results.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes a novel active learning and semi-supervised learning approach to solve diagnostic radiography classification tasks. They develop four consistency-based semi-supervised learning methods. Their proposed method outperforms leading semi-supervised active learning baselines with very low labeling budgets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper:

    1. the proposed method combines active learning and semi-supervised learning together and leverages unlabeled samples during model training.
    2. they propose four consistency-based semi-supervised learning methods.
    3. the proposed model (eNoT+AU) achieves the best performance when dealing with the Diagnostic Radiograph Classification task.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some weaknesses of this paper are listed below:

    1. No ablation study of the proposed model is reported in the paper. For example, the contributions of different loss terms.
    2. Please add more explanations for the symbols in the equations. For example, E and \gamma in Eq.(2).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No further comments on the reproducibility. It looks good for me.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    An additional comment: please consider adding some examples of the dataset you used in your paper to make readers know what kind of data you are working on.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of this paper is presented clearly, and details of model training and parameter settings are reported accurately. The proposed model is evaluated on a public dataset and compared with several SOTAs and gets good results. However, there are some weaknesses, such as the ablation study. Therefore, I recommend this paper be weakly accepted.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper introduces a new active learning and semi-supervised learning approach called CDSEAL to be applied to consistency-based semi-supervised learning methods. Results on chest X-ray classification show that the proposed approach is better than other semi-supervised + active learning baselines with low labelling budgets. This paper has received a consistently positive assessment from the reviewers. The positive points identified are: 1) problem is well-motivated given that expert annotations are rare and expensive; 2) paper framed well thee proposed approach w.r.t. the current state of the art; 3) paper is well-written; 4) low-range labelling regime is realistic and well done; 5) the approach can be widely applied to many SSL methods; 6) method obtains better performance than strong baselines like COD + TOD and VAT + AugVar; and 7) good improvements for the rare classes. As for the negative points, we have: 1) the paper should have benchmarked the SSL methods used to implement CDSEAL; 2) the experiments should have included the area under the precision-recall curve results to better assess highly imbalanced datasets; 3) the paper should avoid using too many abbreviations; 4) it is hard to say if the method will generalise beyond the single X-ray dataset with a single model (Densenet-121 backbone); and 5) no ablation study. Given the positive comments and high scores, I recommend the paper to be accepted.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We thank the reviewers for their careful reading and constructive feedback on our work. We proposed and demonstrated a novel Consistency-based Semi-supervised Evidential Active Learning framework (CSEAL) to address data annotation challenges for deep learning. We are encouraged by the reviewers’ enthusiasm for our work. The main points of feedback pertain to strengthening the benchmarking and performance metrics, streamlining the presentation for greater clarity, and including a discussion of the limitations (especially w.r.t generalisability). We address these in turn below.

  1. Benchmarking and Performance Metrics The reviews rightly point out that our manuscript did not benchmark the improvements offered by CSEAL in relation to the semi-supervised learning methods used within CSEAL. We agree that this will enable a more comprehensive evaluation of our approach, and have now compared our proposed evidential semi-supervised learning approaches in relation to the semi-supervised learning methods used for their implementation. We observe that the AUROC of our evidential semi-supervised learning approaches is comparable or up to 2% higher than that of their non-evidential counterparts, on average. We will include the results in the Supplement of the final manuscript.

Further, the reviewers highlight that it will be useful to include the AUPRC, beyond AUROC, given the class-imbalanced nature of the NIH-14 Chest X-Ray dataset. To address this issue, we have characterised the AUPRC of our best performing CSEAL method in relation to the SOTA baselines. For the low labelling range (budget 5%), the AUPRC of the eNoT+AU approach is up to 2.6% higher than the SOTA semi-supervised active learning baselines, on average. For the mid labelling range (budget 10%), the AUPRC of the eNoT+AU is up to 1.9% higher than SOTA, on average. We will update the Results sections of the final manuscript to elaborate on the AUPRC.

  1. Presentation The reviewers highlighted the need to streamline the use of abbreviations, define symbols more clearly and make the discussion of data and results more accessible. We agree that this would substantially improve readability of our manuscript. For the results exhibits, we will include additional descriptions for the abbreviations in legends and captions, and add more detailed description of the results for Figure 3. We will also generally reduce the number of abbreviations used in the text and define symbols more expressly. Finally, space permitting, we will provide exemplar images in our dataset in the Supplement.

  2. Limitations and Future Directions The reviewers have noted that our manuscript does not address generalisability beyond the X-Ray dataset and DenseNet121 backbone, and that an ablation study could provide deeper insights into the influence of the different loss components.

We concur with the reviewers that although the Chest X-Ray multi-label image classification is a challenging task, a current limitation of our approach is its demonstration with a single dataset and specific CNN backbone. Future work could focus on extending our framework to more datasets from other radiology modalities (e.g., CT and MRI). Further, we agree that a theoretical and empirical investigation of the different loss components would provide insight into the contributing factors to the effectiveness of CSEAL. However, for our proposed framework, this would require performing an extensive study for all of its 4 methods. Given limitations of space in this conference paper, we leave this to future work. We propose to explicitly address these points by including a paragraph on the key limitations and future directions in the Discussion section of the final manuscript.



back to top