Authors

Kaushik Roy, Peyman Moghadam, Mehrtash Harandi

Abstract

The performance of a lifelong learning (L3) model degrades when it is trained on a series of tasks, as the geometrical formation of the embedding space changes while learning novel concepts sequentially. The majority of existing L3 approaches operate on a fixed-curvature (e.g., zero-curvature Euclidean) space that is not necessarily suitable for modeling the complex geometric structure of data. Furthermore, the distillation strategies apply constraints directly on low-dimensional embeddings, discouraging the L3 model from learning new concepts by making the model highly stable. To address the problem, we propose a distillation strategy named L3DMC that operates on mixed-curvature spaces to preserve the already-learned knowledge by modeling and maintaining complex geometrical structures. We propose to embed the projected low dimensional embedding of fixed-curvature spaces (Euclidean and hyperbolic) to higher-dimensional Reproducing Kernel Hilbert Space (RKHS) using a positive-definite kernel function to attain rich representation. Afterward, we optimize the L3 model by minimizing the discrepancies between the new sample representation and the subspace constructed using the old representation in RKHS. L3DMC is capable of adapting new knowledge better without forgetting old knowledge as it combines the representation power of multiple fixed-curvature spaces and is performed on higher-dimensional RKHS. Thorough experiments on three benchmarks demonstrate the effectiveness of our proposed distillation strategy for medical image classification in L3 settings.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_12

SharedIt: https://rdcu.be/dnwxU

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #3

Please describe the contribution of the paper

The paper describes using a curved embedding space in incremental learning applications (i.e. applications that can add tasks and associated datasets. The approach has been evaluated on three public datasets, and compared to seven state-of-the-art methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well organized, and the approach is well described.The code is publicly available, and also the used datasets are publicly available. The approach is compared with seven state-of-the-art methods, and the resulting numbers are quite favorable.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The biggest weakness of the paper is the fact that it is not presented in the context of a clear clinical application. While the performance of the method has been evaluated with clinical data, it seems that the authors generally created the described approach for incremental learning, and then later decided to map it on a clinical context. The paper would fit much better with MICCAI if a very clear clinical application would have been described where incremental learning is important and the described catastrophic forgetting is a serious issue.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides sufficient details, and certainly in combination with the publicly available code and data, the reproducibility can be considered excellent.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

My most important suggestion, additionally to the strengths and weaknesses described above, would be to much clearer present in section 6 what the task is, what is being classified, and what the disjoint label spaces are for the three datasets used.

Question: what does the forgetting rate of -0.70 in table 1 mean? Minor comment: please mark in table 1 the best scores in bold.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is clearly well described and validated. The lack of a clear clinical application hampers the submission to MICCAI. Perhaps a more fundamental machine learning conference would be more suitable.
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The authors of the paper study the mixed-curvature space for the continual medical image classification task, although the mixed-curvature space representation learning has already been studied. They propose a novel knowledge distillation strategy to maintain a similar geometric structure for continual learning. This is done by minimizing the distance between new embedding and subspace constructed using old embedding in RKHS. Their quantitative results shows their proposed distillation strategy is capable of preserving complex geometrical structure in embedding space and leads to significantly less degradation of the performance of continual learning. The proposed method also shows superior performance compared to state-of-the-art baseline methods on three different medical image datasets: BloodMNIST, PathMNIST, and OrganaMNIST.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is really well-written and it is coherent. The paper has sound mathematical proofs and concepts with the flow of the each formulation very easy to follow and grasp. The paper shows outperformance to other benchmarks models (also weakness and will be discussed)
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

In terms of benchmark models, there seems to be old and need to include more recent state of the art models.

The related work and benchmark needs to include other recent papers in incremental learning and continual learning with different strategies and not limited to knowledge distillation. Why one should care about this specific mixed-curvature strategy?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There is no code available to validate but the authors provided a supplement material for better clarification which is appreciative.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

In figure 1, the concept of knowledge distillation and notion of tangent plane in hyperbolic space is confusing and quite misleading. It is better to better illustrate this figure.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Based on the minor weakness which is the lack of new benchmark and related work to more recent works in top venues in incremaental learning the reviewer suggest strong accept. This is a great theoretical work both beneficial for medical imaging community and general computer vision down stream tasks.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This is the first work to study mixed-curvature space for the continual medical image classification. A novel knowledge distillation strategy is proposed to maintain a similar geometric structure for continual learning by minimizing the distance between new embedding and subspace constructed using old embedding in RKHS. Experimental results demonstrate that the proposed method is effective and outperforms SOTA.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main strength is a novel formulation for continual learning. This work proposes a new knowledge distillation strategy, which maintains the geometric structure by minimizing the distance between new embedding and subspace constructed using old embedding in RKHS, where the embedding spaces are constant curvature spaces, Euclidean or hyperbolic.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It will be more helpful to explain why hyperbolic embedding space is preferred for the distillation method. In general, the computation in hyperbolic space, such as Poincare disk for 2D case, is highly unstable, due to the fact that the conformal factors approaches to infinity near the boundary.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The algorithmic details are well explained. The testing data sets are public. The algorithm should be reproducible. The concern is the numerical stability for the computation in hyperbolic space.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The work is well written, it can be further improved by adding more details for the theoretical part.

Figure 1 should be more intuitive and explained in more details: is the Poincare ball of the old model a submanifold of the Poincare ball of the new model ? Is the Euclidean embedding space equivalent to the tangent space of the Poincare ball ? Is the embedding space of the old model a subspace of the embedding space of the new model ?

The Riemannian metric and the isometric transformation group for the Poincare ball should be added. The geodesics and exponential map should be explained and illustrated using a figure. The geometric meaning of the Mobius addition of two points needs to be further explained.

The similar distillation strategy can be applied for Euclidean embedding space as well. Why is the hyperbolic embedding superior to the conventional methods?

The isometry group of the Poincare ball should be addressed. For example, it is the Mobius transformation group on the 2D case， therefore the embeddings are not unique, the Mobius transformation group should be quotient out in some way.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work proposes a novel distillation strategy, which maintains the geometric structure for continual learning by using hyperbolic embedding space. The idea is novel and the experimental results are convincing.

There are some concerns: why hyperbolic embedding is better than conventional Euclidean embedding; whether it is necessary to quotient out the isometry group; how to handle the numerical instability in the hyperbolic space.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The work studied mixed-curvature space for continual medial image classification. A novel knowledge distillation strategy was proposed to maintain a similar geometric structure for continual learning by minimizing the distance between new embedding and subspace constructed using old embedding in RKHS. Experimental results demonstrated that the proposed method is effective and outperforms SOTA. The reviewers acknowledged its novelty and solid expexperimental results. In its camera ready submission, the authors are encouraged to incorporate the reviewers’ comments to further improve its presentation quality.

Author Feedback

N/A

back to top

L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space