List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Wentao Zhang, Yujun Huang, Tong Zhang, Qingsong Zou, Wei-Shi Zheng, Ruixuan Wang
Abstract
Currently intelligent diagnosis systems lack the ability of continually learning to diagnose new diseases once deployed, under the condition of preserving old disease knowledge. In particular, updating an intelligent diagnosis system with training data of new diseases would cause catastrophic forgetting of old disease knowledge. To address the catastrophic forgetting issue, an Adapter-based Continual Learning framework called ACL is proposed to help effectively learn a set of new diseases at each round (or task) of continual learning, without changing the shared feature extractor. The learnable lightweight task-specific adapter(s) can be flexibly designed (e.g., two convolutional layers) and then added to the pretrained and fixed feature extractor. Together with a specially designed task-specific head which absorbs all previously learned old dis- eases as a single ‘out-of-distribution’ category, task-specific adapter(s) can help the pretrained feature extractor more effectively extract dis- criminative features between diseases. In addition, a simple yet effective fine-tuning is applied to collaboratively fine-tune multiple task-specific heads such that outputs from different heads are comparable and consequently the appropriate classifier head can be more accurately selected during model inference. Extensive empirical evaluations on three im- age datasets demonstrate the superior performance of ACL in continual learning of new diseases. The source code is available at https://github.com/GiantJun/CL_Pytorch.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43895-0_7
SharedIt: https://rdcu.be/dnwxP
Link to the code repository
https://github.com/GiantJun/CL_Pytorch
Link to the dataset(s)
https://challenge.isic-archive.com/data/#2019
https://www.cs.toronto.edu/~kriz/cifar.html
https://drive.google.com/drive/folders/1LMHgawD83Z5EmYN6wtLVIibZNrAZglZt?usp=sharing
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes a parameter-isolation-based continuous learning algorithm to address the forgetting issue and uses a memory buffer with a (K+1) classification strategy to retrieve the task identifier information during testing. This framework consists of a fixed backbone with multiple pairs of adapter-classification heads. Each pair of adapter and classification heads is used to learn task-specific knowledge during training.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
S1: The presentation is clear. S2. The framework design is well motivated and makes sense. S3. The results are collected via multiple trails.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
W1: The training and inference algorithm is unclear: 1) how to fine-tuning all heads to make classification head selection accurate, and 2) how to perform the inference, which is essential for a multi-head network. W2. Those baselines are a little weak (to name some SOTA methods: Der++[1], Dual prompt[2]). W3. The results analysis is rough. W4. It is hard to evaluate the reproducibility.
[1] Boschini, M., Bonicelli, L., Buzzega, P., Porrello, A., & Calderara, S. (2022). Class-incremental continual learning into the extended der-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence. [2] Wang, Z., Zhang, Z., Ebrahimi, S., Sun, R., Zhang, H., Lee, C. Y., … & Pfister, T. (2022, November). Dualprompt: Complementary prompting for rehearsal-free continual learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI (pp. 631-648). Cham: Springer Nature Switzerland.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Some details are provided to implement their method, but source code is unavailable for review.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
C1. A major concern is regarding how to conduct inference in batch mode, which is the de facto standard for deep neural networks . Assuming that a batch with samples from different tasks, how to task identification (i.e., selecting the task specific modules) for this batch of data? C2. A detailed training algorithm could be helpful. C3. Results analysis should be more detailed.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Please see strengths and weaknesses.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
I decided to maintain my initial score for this paper because it partially addressed my concerns. For clarification of the training and inference process, I am satisfied and suggest the authors add the pseudo code. For the comparison, the reported results of Xder and DualPrompt are a bit surprising as they achieved the worse performance. The authors may add some explanation for that.
Review #2
- Please describe the contribution of the paper
The paper presents a solution to the issue of catastrophic forgetting in intelligent diagnosis systems that lack the ability to continually learn and diagnose new diseases while preserving old disease knowledge. The proposed solution involves a novel adapter-based strategy that enables the learning of new diseases without changing the shared feature extractor. This is achieved by designing learnable lightweight task-specific adapters and a specially designed task-specific head that absorbs previously learned old diseases as a single “out-of-distribution” category. The proposed method is evaluated on three image datasets and shows superior performance in continual learning of new diseases. The source code will be made publicly available.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Text and visualization of the paper are well-written. It also investigates continual learning, a challenging problem in the medical disease classification.
- The paper aims to use pre-trained large-scale vision models trained on natural images and then adapt them efficiently to medical data using prompt learning.
- SOTA results obtained by the proposed method are satisfactory and convincing.
- Table 2 and 3 show that paper contributions are promising.
- The out-of-distribution (OOD) technique used for task inference in novel.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-
I believe this paper lacks novelty as it is somehow similar to the adaptor technique used in vision and language models [1, 2]. In the same way, this paper uses similar idea to develop their main contribution. I would encourage authors to differentiate their work from [1,2] in the related work.
-
In this paper, authors aim to improve the continual learning performance of convolutional neural networks, and in table 3, they aim to show the generalizability of their method across different CNN architecture e.g. ResNet18, EfficientNetB0, and MobileNetV2. Yet, what about visual transformer (ViT)? I would encourage authors to explain the reason why they did not use ViT as ViT architecture recently sets many records for different vision tasks.
[1]. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling, 2021 [2]. CLIP-Adapter: Better Vision-Language Models with Feature Adapters, 2021
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Yes, it is reproducibile.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
The paper is generally written in a nice manner and doesn’t need to be changed much. Nonetheless, perhaps authors should distinguish themselves from state-of-the-art prompt learning methods e.g. Tip-Adapter and CLIP-Adapter .
LifeLonger paper (LifeLonger: A Benchmark for Continual Disease Classification, MICCAI 2022) also proposes a new setting called “fine/coarse grained cross-domain incremental learning” for evaluating any continual learning method in case distribution shifts across different datasets. I would encourage authors to check their model performance in cross-domain incremental learning as well. Furthermore, forgetting metric is also introduced in this paper. Following the LifeLonger paper, it would be really helpful if the authors reported the forgetting metric.
Lack of discussion on prompt learning literature on vision model as well as vision and language models as the paper built on top of these enhancements. please see:
[1]. Prefix-tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation [2]. Low-rank adaptation: LoRA: Low-Rank Adaptation of Large Language Models [3]. Pormpt tuning: The Power of Scale for Parameter-Efficient Prompt Tuning [4]. Visual prompt uning: Visual Prompt Tuning [5]. Bayesian prompt tuning: Bayesian prompt learning for vision-language model generalization [6]. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling, 2021 [7]. CLIP-Adapter: Better Vision-Language Models with Feature Adapters, 2021
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The only issue is the lack of novelty. In that case, I will vote for weak rejection, but I will await other reviewers’ comments and rebuttals from the authors. I might upgrade my score to weak accept.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
I am satisfied with authors rebuttal and would like to change my score to weak accept.
Review #3
- Please describe the contribution of the paper
The paper’s contribution is to provide a solution to the catastrophic forgetting issue in intelligent diagnosis systems by introducing an adapter-based strategy that can continually learn new diseases without changing the shared feature extractor.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method uses learnable, light-weight, and task-specific adapters that can be customized to fit different types of diseases.
- The task-specific heads with the special ‘out-of-distribution’ output neuron within each head helps keep extracted features discriminative between different tasks.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper does not provide a detailed explanation of how the task-specific head absorbs previously learned old diseases as a single ‘out-of-distribution’ category.
- This paper lacks deep analysis or insights into why the proposed method works well in preventing catastrophic forgetting in intelligent diagnosis systems. (Maybe it has been demonstrated somewhere I am not found)
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Good. key resources (e.g., proofs, code, data) are available and sufficient details (e.g., proofs, experimental setup) are described such that an expert should be able to reproduce the main results.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- I suggest providing a more detailed explanation of how the task-specific head absorbs previously learned old diseases as a single ‘out-of-distribution’ category. This would help readers understand how this component contributes to preventing catastrophic forgetting.
- It’s better to differentiate with similar methods like Adapters[1] and some prompts tuning work like VDT[2] and VPT[3]
[1] CLIP-Adapter: Better Vision-Language Models with Feature Adapters. [2] Visual Domain Prompt for Continual Test Time Adaptation. AAAI 2023
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
See the strength
- Reviewer confidence
Somewhat confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
All reviewers acknowledge the good presentation of this paper and the overall method design. However, reviewers have concerns about technical clarification (R1, R3), more detailed experimental analysis (R1, R3), and discussion/difference with more related works (R1, R2). The authors are given the chance to further clarify these points in the rebuttal.
Author Feedback
Q1(R#1): How to fine-tune all heads and perform inference? A1: All t heads are fine-tuned together by min Eq (2), with class-balanced data (note: small data kept per old class). With each data in a mini-batch, for each task-specific (TS) head, TS feature vector is extracted from shared feature extractor and TS adapters, and then fed to the TS head. Each input is used for all t heads, as ‘others’ class (t-1) times and non-‘others’ once. Such multi-usage of each data makes outputs of t ‘others’ neurons comparable across all t heads, and so helps model select appropriate one among multiple heads for each test data. In inference, similar forward process is run for each test data. The head with the smallest ‘others’ output probability (among all t ‘others’ outputs) is selected to predict the input. Note for a batch, TS adapters of all tasks are used for each data. Q2(R#1): About baselines (W2), result analysis (W3), and code (W4). A2: DynaER is one SOTA method and used here. As suggested, we run Der++ & Dual prompt on our settings, with worse results (MCR 1.32-8.38 lower). Will include new results, provide detailed analysis, and release code. Q3(R#2,#3): Paper lacks novelty: similar to adapters in CLIP-Adapter (CA) & TIP-Adapter (TA). Also compare with prompt tuning. A3: Our adapter differs from existing adapters in (1) structure: CA & TA use 2-layer MLP or cache model, while ours uses a 2-layer convnet with a global scaling factor; (2) number and locations in model: CA & TA use adapter only at output of last layer, while ours appears between each two consecutive CNN stages; (3) roles: existing adapters are for few-shot classification, while ours is for continual learning (CLearn). Also, initial tests based on CA’s MLP adapters for CLearn show less effective (e.g., last round MCR 38.36 on Skin8, with 4 rounds) than ours (MCR 50.38). It takes effort to apply TA’s adapters for CLearn (but expect similar worse result). Our adapter also differs from current prompt tuning. Prompts appear as part of input to the first or/and intermediate layer(s) of model, often in the form of learnable tokens for Transformer or image regions for CNNs. In contrast, our adapter appears as an embedded neural module for each two consecutive CNN stages, in the form of sub-network. Ours may be seen as a special new prompt for CNN in a sense. Will discuss such differences in paper. Q4(R#2): Why not use ViT as backbone and for generalizability evaluation? A4: SOTA methods (e.g., DynerER) for CLearn on natural images are mainly based on CNNs, and ViTs have not completely surpassed CNNs especially for medical image classification with small training set. With above considerations, we focus our study on CNN backbones. For generalizability, our stage-wise adapter can be potentially extended to ViT backbones. However, the proposed CNN-based adapter should be replaced by MLP-based adapter to fit ViTs. We will investigate the extension in future work. Q5(R#2): Try on cross-domain incremental learning. A5: Suggested evals were run. Ours is better (MCR 7.2-16.2 higher on MedMNIST). Q6(R#3): How task-specific (TS) head absorbs old diseases as single ‘out-of-distribution’ (OOD) category. A6: For a new TS head, the head considers all C_t classes from the new task as in-distribution (ID) classes and classes from all previous tasks as the OOD category. So, preserved small data of all previous tasks are collected as OOD data. The ID data of C_t classes and OOD data are used to train the new TS adapters and head (also see A1). Q7(R#3): Why alleviate (note: not prevent) catastrophic forgetting? A7: Learned old knowledge is implicitly stored in model parameters. Since shared feature extractor (SFE) and learned TS adapters (TSA) are fixed, knowledge of old diseases stored in SFE and TSA are preserved during continual learning. Also, all heads are fine-tuned together to help model more likely select the correct head during inference. Other comments are also adopted to refine paper
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
Aftet the rebuttal, three reviewers are both positive for this paper. I recommend reject this paper while there are some important points that the authors should clarify in the final version, such as the low reported results of Xder and DualPrompt.
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This paper is well written as recognised by the reviewers. The novelty and training and inference process are clarified in rebuttal. Thus I would vote for accept.
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors have clarified most of the major concerns of reviewers, specifically technical novelty, experimental analysis, and discussion with the related works. I found the proposed method on adapter learning in class-incremental settings is an emerging domain to adapt foundation models in domain specific tasks. I will recommend accepting this work to discuss further in the MICCAI2023.