Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Weibin Liao, Haoyi Xiong, Qingzhong Wang, Yan Mo, Xuhong Li, Yi Liu, Zeyu Chen, Siyu Huang, Dejing Dou

Abstract

While self-supervised learning (SSL) algorithms have been widely used to pre-train deep models, few efforts have been done to improve representation learning of X-ray image analysis with SSL pre-trained models. In this work, we study a novel self-supervised pre-training pipeline, namely Multi-task Self-supervised Continual Learning (MUSCLE), for multiple medical imaging tasks, such as classification and segmentation, using X-ray images collected from multiple body parts, including heads, lungs, and bones. Specifically, MUSCLE aggregates X-rays collected from multiple body parts for MoCo-based representation learning, and adopts a well-designed continual learning (CL) procedure to further pre-train the backbone subject various X-ray analysis tasks jointly. Certain strategies for image pre-processing, learning schedules, and regularization have been used to solve data heterogeneity, over-fitting, and catastrophic forgetting problems for multi-task/dataset learning in MUSCLE. We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection. Comparisons against other pre-trained models confirm the proof-of-concept that self-supervised multi-task/dataset continual pre-training could boost the performance of X-ray image analysis.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_15

SharedIt: https://rdcu.be/cVRYX

Link to the code repository

N/A

Link to the dataset(s)

https://nihcc.app.box.com/v/ChestXray-NIHCC

https://www.kaggle.com/kmader/rsna-bone-age

https://openi.nlm.nih.gov/faq#collection

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

https://www.kaggle.com/sonu26072001/mura-final

https://www.kaggle.com/nikhilpandey360/chest-xray-masks-and-labels

https://www.kaggle.com/usmanshams/tbx-11


Reviews

Review #1

  • Please describe the contribution of the paper

    From the title of this paper, we can imagine the topic of this paper is to make self-supervised learning, continual learning and multi-task learning for deep X-ray image classification. The proposed method aggregates Xray images collected from different body parts for MoCo-based representation learning with continual learning (CL) procedure for X-ray analysis tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. As the authors claimed, the multi-task learning, self-supervised learning and continual learning are jointly unified in one learning scheme.
    2. The overall learning paradigm is clear and the authors employ many tricks to improve the accuracy of the prediction on X-ray images.
    3. Many experiments validate their claims on different datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Generally, I have to say that just confirming very high accuracies is not the reason of accepting a top conference paper. The authors made a lot of experiments to show the superiority of the proposed method, while the contributions of this paper are a simple combination of existing well-known learning schemes. The authors only made some simple modification on two resi-net networks.
    2. The authors are hard to figure out why is it useful when combining the self-supervised learning and continual learning for multi-task tasks.
    3. For multi-task problem, using different datasets or collecting parts from one body is hard to validate its effectiveness. If possible, the authors need to compare your work to multi-view or multi-model learning from theoretic and experimental perspectives.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are lots of detailed information that is neccesary to be realized for this work, because they employed three learning schemes for one well-testified task.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See weakness.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall learning scheme is a simple combinition, while the experiments are very good.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    After reading the comments from the other reviewers and the rebuttal doc from the authors, I may still keep my opinion on this paper, that is, weak reject, while I rate my score from WR to WA. Again, high accuracy is not a strong reason of accepting one paper.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a self-supervised training method of pre-train models for X-ray images, which is named MUSCLE. The method consists of pre-processing, pre-training, continuous learning and fine-tuning steps. MUSCLE adopts MoCo method to train the backbone network from multiple data sets and continuous learning to avoid over-fitting and “catastrophic forgetting”. In the results section, detailed experiments demonstrate the effectiveness of the training method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The innovative point of this paper is the extension of MoCo framework into multi-dataset training with cyclic and reshuffled learning schedule and continuous learning strategy. It uses multiple datasets pre-training to solve the heterogeneity problem and uses continuous learning to tackle catastrophic forgetting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    However, the main contribution of the paper lies in training pipeline innovation rather than training algorithm innovation. The experiment section just lists the results. More analysis is need to discuss about the inherent reasons for the results improvements.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is good, the algorithm is relatively easy to implement.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The result section can be enriched with more in-depth discussions of the results, rather than simply listing the results.

    The column widths of Tables 2 and 3 need to be adjusted for clearer display. The texts in Figure 2 are too small.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The innovativeness is moderate, the method is effective and results are convincing.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal, I still think the main contribution is the pipeline rather than the network model. I agree that it proves the feasibility of using multiple datasets/tasks to pre-train X-ray models via self-supervised representation learning. However, such contribution is not considered as a good novelty in a top conference like MICCAI. The author explained the reason of lacking in-depth discussion is the limit of paper length, I don’t agree.



Review #3

  • Please describe the contribution of the paper

    MUSCLE is proposed to pre-train DNNs for X-ray images of multiple body parts for multiple tasks (classification, segmentation, detection) using self-supervised(SSL) and Continual Learning (CL) techniques. The pipeline has 3 stages : 1) MD-MOCO uses 9 xray datasets to modify MoCo-CXR to SSL pretrain a backbone DNN after preprocessing the data, 2) Continual learning is applied to further pretrain the backbone with task specific heads in a cyclic fashion across 4 tasks. Only 4/9 datasets are used from this stage onwards, 3) Independent task-specific fine tuning with 4 datasets is done by having both the backbone and 4 task specific heads.

    The experiments have 4 baselines named - Scratch, ImageNet, MD-MoCo, and MUSCLE– , to compare against MUSCLE. Task specific performance metrics are used for the comparison. MUSCLE is shown to have better numbers on many of the metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The use of 9 publicly available datasets makes it possible for others to compare against. Evaluation of MUSCLE shows improvements for many of the reported performance metrics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Minor extentions and modifications to existing SSL and CL methods are proposed in this work and applied to xray images. -7/9 are chest xray images. Only 2 datasets are for other body parts. The “multi-dataset” claim is much narrower than it seems, because there is less diversity in the input data. -The performance is evaluated only on 4 datasets. Why is the evaluation not done on all the 9 datasets that are used in the first stage SSL? -Imagenet and Scratch are not strong baselines to compare against. ImageNet has RGB natural images, while this paper is dealing with grayscale xray images.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper has descriptions on some of the hyperparameters (learning rate schedule, weight decay, DNN architectures) and training methodology that could be useful in reproducing the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    -As accepted by the authors, MUSCLE is proposed as a “proof-of-concept”. It is not optimized for any single task and no evaluation is done to compare against other methods in the literature that could make it clinically useful. -Since the proposed framework is quite elaborate with many components, it would very helpful to make the source code public.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a well-written paper that explains the method clearly. The paper proposes a framework that makes minor modifications to existing SSL and CL techniques and shows improvements on most of the evaluation metrics.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper did not develop any new methods but used existing pipelines for training. The strengths of this paper was the combination of multi-task learning, self-supervised learning and continual learning for training large datasets. The authors evaluated the method on a dataset of Xrays, however this idea is general and can be applicable to other datasets as well. The authors perform extensive validation of their method and also compare their results to other established networks with a superior performance.

    The authors are suggested to respond to the reviewers comments, especially from the point of view of novelty and by providing a justification why a combination of learning approaches is superior.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

Thank you for the time and effort you have spent reviewing our paper. We here respond reviewers as follows. Novelty & Existing Works (Reviewer#1, #3, Meta-Reviewer): MUSCLE still made contributions in novelty even if it is a straightforward-yet-effective deep learning pipeline consisting of existing learning schemes. First of all, as mentioned in our article, we study the feasibility of using multiple datasets/tasks to pre-train X-ray models via self-supervised representation learning. The most relevant work to our study is MoCo-CXR (Sowrirajan et al. 2021 [11]), which leverages MoCo-based self-supervised representation to train the backbone for X-ray image analytics. We thus derive MoCo-CXR into SD-MoCo and MD-MoCo as two baseline algorithms under single-dataset and multi-dataset settings, respectively. The comparison against SD-MoCo and MD-MoCo demonstrated the advantages of MUSCLE in multiple tasks, due to the use of multiple datasets and multi-task continual learning (for task-specific discriminative features).

Finally, MUSCLE has been demonstrated not only on the five datasets mentioned in the paper and the appendix, but also on other datasets that did not participate in any pre-training (MD-MoCo or continual learning) phases of MUSCLE. On CheXpert dataset, it gains an 88.52% average AUC on ResNet-18 (outperforms Scratch 5.62%, and ImageNet 5.73%, and SD-SimCLR(single-dataset SimCLR) 2.76%, and MD-SimCLR(multi-dataset SimCLR) 2.71%, and SD-MoCo(single-dataset MoCo) 0.41%, and MD-MoCo 0.34%, and MD-MoCo-II(multi-dataset MoCo with ImageNet Initialization) 1.09%, and MUSCLE– 1.07%) and an 89.22% average AUC on ResNet-50 (outperforms Scratch 11.65%, and ImageNet 9.96%, and SD-SimCLR 11.27%, and MD-SimCLR 9.56%, and SD-MoCo 1.82%, and MD-MoCo 1.99%, and MD-MoCo-II 0.49%, and MUSCLE– 0.61%). And On Deep-Covid dataset, it gains an 99.94% AUC on ResNet-18 (outperforms Scratch 3.14%, and ImageNet 0.96%, and SD-SimCLR 0.83%, and MD-SimCLR 0.23%, and SD-MoCo 0.30%, and MD-MoCo-II 0.02%, and MUSCLE– 0.05%, just lower than MD-MoCo 0.01%) and an 99.91% AUC on ResNet-50 (outperforms Scratch 1.78%, and ImageNet 1.00%, and SD-SimCLR 0.98%, and MD-SimCLR 1.47%, and SD-MoCo 0.08%, and MD-MoCo 0.15%, and MD-MoCo-II 0.16%, and MUSCL– 0.20%). The advantages on these tasks further confirm the generalizability of MUSCLE.

Why MUSCLE works (Reviewer #1, #2, Meta-review): As a “proof-of-concept”, we assumed that X-rays from different body parts for different analytical tasks can help each other in machine learning, and therefore use self-supervised contrastive learning with normalization to learn visual representations from multiple datasets and obtain more generalizable backbones. The addition of continual learning allows the backbone to learn discriminative features from various tasks, including classification, detection and segmentation, so as to better handle multiple tasks with one model.

Discussion in-Depth, elaboration, visualization, and source codes (Reviewer #2 and #3): Due to the page limit, we didn’t include more discussion, elaboration, and full-size image visualizations in the manuscript. In fact, we have carried out rigorous ablation experiments to demonstrate the importance of each module of the MUSCLE. Specifically, we compared SimCLR and MoCo derivatives in our experiments to confirm the advantage of MoCo-based representation learning (just like MoCo-CXR). Further, the comparisons between SD-MoCo and MD-MoCo demonstrated the improvement made by the use of multiple datasets in MoCo-based X-ray analytics. Later, the comparisons between MUSCLE– and MD-MoCo examined the effectiveness of general continual learning. The comparisons among MD-MoCo, MUSCLE– and MUSCLE finally confirmed the benefit of our proposed multi-task continual learning procedures with cyclic learning rate, reshuffled schedules, and inductive bias. We will include contents desired with source code released in the camera-ready and the appendix.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While the authors acknowledge that the contribution of their paper is a deep learning pipeline consisting of existing learning schemes, they did a good job in addressing the reviewer concerns in the rebuttal. This led to upgrading the scores by the reviewers from reject to accept, even if they still maintained their original reservations. In my opinion, even if that is the case, the paper does a good job of evaluation on datasets and experimental validation against state of the art methods and addresses an important challenging problem in the medical imaging community. This does warrant the acceptance of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    11



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is an interesting paper that presents a novel self-supervised pre- training pipeline the call MUSCLE which is based on existing known methods such as MoCo-based representation learning.

    MUSCLE seems to learn well the representation of X-ray images through learning to perform different tasks using data from various sources. The authors offer a wide range of tests and compare their approach with existing methods.

    While the authors offer comparisons with multiple methods on many datasets and report measures such as AUC where MUSCLE seems to outperform the other approaches, I could not find either confidence interval reports nor report on statistical tests to establish significant superiority.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    15



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper focuses on empirical studies by combing a few previous methods other than proposing a novel one. I believe empirical papers definitely have their merits. But we need to have a more rigorous standard for the experimental results because the MICCAI readers would rely on the conclusion from empirical papers to guide their experiments.

    However, the major contribution from the title, abstract and introduction of the paper seems to be leveraging X-ray images from multiple body parts. This was not verified in the experiments. For example, pneumonia classification using only chest X-rays compared to pneumonia classification using multiple body parts. This important experiment was missing. Including multile body parts would introduce extra preprocessing of the training images. If the performance gain is not significant, we should probably avoid guiding the community to include more images. Furthermore, the performance comparisons did not use strong baselines. The performance compared to single dataset MoCo seems not that significant.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    12



Meta-review #4

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    AC recommendations on this paper were split with a majority vote of “rejection”, while the reviewers expressed consensus in supporting acceptance after rebuttal. The PCs thus assessed the paper reviews, meta-reviews, the rebuttal, and the submission. While the innovation was considered to be moderate and areas of future improvements were suggested, the reviewers were convinced by the effectiveness of the method and the presented evaluation and results. All reviewers have participated in the rebuttal and remained supportive (or changed from rejection to support of acceptance) after rebuttal. The PCs agree with the convincing arguments of the reviewers and AC, and thus the final decision of the paper is accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top