Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuanyuan Chen, Xiaoqing Guo, Yong Xia, Yixuan Yuan

Abstract

Annotated images for rare disease diagnosis are extremely hard to collect. Therefore, identifying rare diseases based on scarce amount of data is of far-reaching significance. Existing methods target only at rare diseases diagnosis, while neglect to preserve the performance of common disease diagnosis. To address this issue, we first disentangle the features of common diseases into a disease-shared part and a diseasespecific part, and then employ the disease-shared features alone to enrich rare-disease features, without interfering the discriminability of common diseases. In this paper, we propose a new setting, i.e., generalized rare disease diagnosis to simultaneously diagnose common and rare diseases. A novel selective treasure sharing (STS) framework is devised under this setting, which consists of a gradient-induced disentanglement (GID) module and a distribution-targeted calibration (DTC) module. The GID module disentangles the common-disease features into disease-shared channels and disease-specific channels based on the gradient agreement across different diseases. Then, the DTC module employs only disease-shared channels to enrich rare-disease features via distribution calibration. Hence, abundant rare-disease features are generated to alleviate model overfitting and ensure a more accurate decision boundary. Extensive experiments conducted on two medical image classification datasets demonstrate the superior performance of the proposed STS framework.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_49

SharedIt: https://rdcu.be/cVRuA

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a generalized network for rare disease diagnosis, which simultaneously diagnose common and rare diseases. The network includes a gradient-based disentanglement (GID) module that separates common-disease features into disease-shared channels and disease specific channels. Also, it includes a distribution-targeted calibration (DTC) module that uses disease-shared channels to enrich rare-disease features via distribution calibration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Network design is novel
    • Comparison with other existing methods
    • Performing Ablation study
    • The paper is well written
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • details of network architecture are not clear
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide most of the details, but the network architecture needs to be clear

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Eq. (1): please revise it. The magnitude of g* is not included. Also, use magnitude for gi.

    For the GID module, how is sort implemented. Also, sort operation is not differentiable, so how does this affect the backpropagation? Please clarify.

    For the DTC module, how is the mask generated? Please clarify. Also, provide visual example if possible.

    The details of the network architecture are not clear. Please consider adding some details on the figure or add them in text.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • experiments and results are strong and comprehensive
    • paper is well written
  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes two modules to i) disentangle common disease features into disease-specific and disease-agnostic channels based upon gradient agreement, ii) use disease-shared channels to enrich rare disease features via distributon calibration. With a WideResNet backbone, the authors compare the method against six recent approaches, producing improved results on two medical image classification tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clear and easy to read, with good clinical motivation.
    • The method is well described, interesting and properly motivated.
    • Comparison with six recent approaches.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The MICCAI writing template was clearly violated (removing spaces between text and table, wrapping text around table).
    • No statistical test to compare proposed methods in table 1. This is particularly important as the method is compared against multiple baselines.

    Questions:

    • Was the WideResNet backbone also used for the baseline experiments?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Generally good. However no ‘analysis of statistical significance of reported differences in performance between methods’, ‘average runtime for each result, or estimated energy cost’, ‘description of the memory footprint’ is provided. As stated in section 5, the lack of statistical comparison is a negative.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Don’t risk desk rejection by modifying the writing template, read https://conferences.miccai.org/2022/en/PAPER-SUBMISSION-AND-REBUTTAL-GUIDELINES.html carefully.
    • Use the supplementary materials for space.
    • Use a statistical test, e.g. Wilcoxon signed-rank test for table 1, taking into account multiple comparisons.
    • Some of the phrases in the paper were strange e.g. ‘identifying rare diseases based on scarce amountof data is seen as inevitable.’ ‘studying automated rare disease diagnosis using an extraordinarily scarce amount of medical images is of far-reaching significance’ ‘selective treasure sharing (STS)’ what does ‘treasure’ refer to here?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Clearly written.
    • Interesting method applied to a relevant clinical problem.
    • Comparison with six recent papers.
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper approaches the problem of computer-aided-diagnosis by framing it into a few-shot-learning problem. The most occurring classes are used to learn shared and class specific features. The distinction between those categories relies on the gradient consistency across classes. Rare diseases produce increments to the shared features that are class specific. To avoid biases induces by the class imbalance and normal/lesion areas, the authors learn to calibrate the distributions via an attention mechanism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The major strength of the paper is the formulation of the classification task into a few shot learning task. The concept of splitting feature maps into shared and specific features based on the gradient agreement addresses the feature disentanglement in an elegant way by circumventing the interpretability issue. From an empiric point of view, the reported numbers show the superiority of the introduced methods compared to the state of the art.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors could have also thought of framing the problem into a continual few-shot-learning problem with a single incremental epoch. A comparison with such a method like AIM (Lee et al. Few-Shot and Continual Learning with Attentive Independent Mechanisms ICCV 2021) would have made the paper stronger

    • The authors argue that learning new rare classes leads to a decrease in performance for the major classes. Although this is an intuitive behavior, several methods from the incremental learning literature have been adopted in the few-shot learning research and help alleviating the catastrophic forgetting issue.(Minor)

    • This paper might present a serious fairness issue since it compares different methods that prerequisite different and partly contradictory setups in order to work well. For instance, the FCICL method which relies on contrastive learning expects larger batch sizes than what the authors report in sec3.Implementation. From the text, it is not clear whether the hyperparameters have been optimized from each method separately (or at least taken from the respective papers). If not, then we are in a setup that favors the proposed method over the other SOTA methods.

    • The description of the DTC deserves better formulation to make easier to understand and reimplement. (minor)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors give a good overview of the adopted hyper-parameters but some details are still missing (for instance number of iterations and augmentation pipeline). Also, given the ambiguity of the DTC module description, a faithful reproduction of the method might be difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In order to make the paper stronger, I kindly ask the authors to reformulate the DTC module description (for example how to crop the lesion region and the normal region). Also clarifying the experimental setup of each of the used methods might help avoiding any fairness issues and let the reviewers and future readers have a less biased feeling of the empirical potential of the method. As I mentioned in the weaknesses section, a comparison with a continual few-shot-learning method would have been nice to get the full picture. This is definitely not required for accepting the paper but in case of rejection, I strongly recommend to add this comparison.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The formulation of the problem as a few-shot-learning problem and the nice formulation of the GID module are the most interesting parts of the paper. The distribution calibration is per se not a new concept but the authors implement it in a slightly different manner. The final rating would depend on the fairness of the comparisons. In case of an appropriate setup, I am willing to upvote the paper.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    As reviewers point out, the paper is well written and of interest to MICCAI, with novel contribution and with solid results. Of particular interest, it is the formulation of the classification task into a few shot learning task.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR




Author Feedback

We sincerely thank all reviewers and ACs for their recognition of the novelty and clinical significance of this work. Here are responses to their invaluable suggestions and remaining concerns.

Q1. Network architecture (R1) We utilized the off-the-shelf WideResNet as backbones of both common and rare branches. Specifically, the WideResNet we utilized consists of 3 convolutional blocks, each having 2 convolutional layers followed by batch normalization and ReLU activation. The size of each convolutional kerner is 3 × 3. The output feature channels of three blocks are 160, 320 and 640, respectively. When performing simultaneous common and rare diseases diagnosis, two branches are concatenated together with a fully connected layer and finetuned using the samples from all diseases.

Q2. Revision of Eq. (1) (R1) We have modified Eq. (1) by replacing the vector g^i with its magnitude |g^i|. Changes will be applied to the camera-ready.

Q3. Sort operation in GID module (R2) To verify disease-agnostic channels with the largest gradient consistency, we summarize the projection lengths of each channel and directly sort them in a descending order using the sort function in pytorch. The sort operation will not affect loss backpropagation, since it can be detached from the computation graph of the network and the sorted vector does not need any gradients. After identifying the indexes of disease-agnostic channels for the common branch, those corresponding channels will be fed into DTC module to assist rare disease diagnosis. During the loss backpropagation in the rare branch, the common branch can also be updated through these selected disease-agnostic channels.

Q4. Make Generation in DTC module (R2, R3) In the DTC module, masks of lesion and normal regions are generated using the off-the-shelf Expectation-Maximization attention (EMA) module. EMA learns a compact set of bases in each image using the expectation-maximization algorithms. In detail, EMA regards the bases for construction as the parameters to be learned and their attention maps as latent variables. Given parameters of the current bases, the expectation (E) step works as estimating the expectation of attention maps and maximization (M) step functions as updating the parameters (bases) by maximizing the complete data likelihood. The E step and the M step execute alternately and we can obtain the normalized final attention maps after convergence. For each feature map, we select two attention maps which best highlight the lesion region and normal region, respectively, and obtain a lesion-region mask and a normal-region mask based on these two attention maps.

Q5. Backbone of other Baselines (R2) No. When re-implementing those comparison methods, we used the original backbone suggested in the literature for each method

Q6. Hyperparameters setup in comparison methods (R3) For a fair comparison, we set the hyperparameters in each comparison method to their default values suggested in the literature.

Q7. Number of iterations and augmentation pipeline (R3) When separately optimizing the common and rare branch, the number of iterations is set to 4000 for both branches. When finetune the whole model for simultaneous common and rare diseases diagnosis, the number of iterations is set to 2000. No data augmentation strategies are utilized for this study.

Q8. Comparing with few-shot continual learning (R3) Most few-shot incremental/continuous learning methods assume that data of previous sessions are not accessible when learning from new sessions. While in our setting, data of common diseases are still available during the learning of rare diseases. To make a distinction with those methods, we model the simultaneous diagnosis of common and rare diseases as a generalized few-shot learning problem in this paper. In our future work, we will, as suggested, explore how to keep the diagnosis performance of common diseases from the perspective of avoiding catastrophic forgetting.



back to top