Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Nicholas Konz, Hanxue Gu, Haoyu Dong, Maciej Mazurowski

Abstract

The manifold hypothesis is a core mechanism behind the success of deep learning, so understanding the intrinsic manifold structure of image data is central to studying how neural networks learn from the data. Intrinsic dataset manifolds and their relationship to learning difficulty have recently begun to be studied for the common domain of natural images, but little such research has been attempted for radiological images. We address this here. First, we compare the intrinsic manifold dimensionality of radiological and natural images. We also investigate the relationship between intrinsic dimensionality and generalization ability over a wide range of datasets. Our analysis shows that natural image datasets generally have a higher number of intrinsic dimensions than radiological images. However, the relationship between generalization ability and intrinsic dimensionality is much stronger for medical images, which could be explained as radiological images having intrinsic features that are more difficult to learn. These results give a more principled underpinning for the intuition that radiological images can be more challenging to apply deep learning to than natural image datasets common to machine learning research. We believe rather than directly applying models developed for natural images to the radiological imaging domain, more care should be taken to developing architectures and algorithms that are more tailored to the specific characteristics of this domain. The research shown in our paper, demonstrating these characteristics and the differences from natural images, is an important first step in this direction.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_65

SharedIt: https://rdcu.be/cVVqm

Link to the code repository

https://github.com/mazurowski-lab/radiologyintrinsicmanifolds/

Link to the dataset(s)

https://www.med.upenn.edu/sbia/brats2018/data.html

https://stanfordmlgroup.github.io/competitions/chexpert/

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226903

https://stanfordmlgroup.github.io/competitions/mura/

https://nda.nih.gov/oai/accessing_images.html

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=68550661

https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper analyses intrinsic dataset manifolds empirically, and shows that medical imaging datasets have a lower number of intrinsic dimensions than natural images, but still medical image recognition tasks are not easy. The authors indicate that their analysis highlights the importance of developing tailored medical image recognition approaches, and not just purely relying on the advances in the natural image recognition.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Great topic
    • Great organization of the paper
    • Very lean approach
    • Large enough benchmark
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I could not identify any. However, code release that allows to reproduce the findings would be nice.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The results are reproducible, but code release would be nice.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    As I have mentioned before, I am happy with the paper as is, but it lacks the openly available code.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Good idea, experimental validation, and valuable insights
  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper computes the intrinsic dimension of the dataset using the method in [18] for natural images and medical images. The experimental results show that the medical images normally have smaller intrinsic dimension as well as worse test accuracy than natural images. The results empirically show that there is difference between natural images and medical images, and thus indicate more careful thoughts are needed to transfer from natural images to medical images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Empirical results are sound by applying the ID estimation on multiple medical image dataset and natural image dataset. The different choice of neural network shows similar results.

    2. The finding of negative dependence between test accuracy and ID is foreseeable, but the difference of scope between natural image and medical image is novel and interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Theoretical contribution of this work is weak. The ID computation is based on the existing method by [17,18], and directly used on medical and natural images. There’s another concern is that maybe the ID computation for natural images is different from medical images. For example, the natural images are bounded by 0-255 and use 3 channels, while radiological images are not bounded and normally only contain 1 channel. Also, although Normalization layer might help to train the neural network, the distribution of radiological image pixels are very different from natural image pixels. So the current existing neural networks like resnet etc. may not be a good method to measure the test accuracy.

    2. The ID computation is based on the assumption of Poisson process and MLE. This is slightly concerning as lacking of evidence. For example, one can show that auto encoder can/ cannot learn a reasonable reconstruction error for different dataset to show the empirical ID for each dataset. This paper directly uses the existing method to this specific domain and shows no toy data example to support the ID estimation. Therefore, the overall estimation of ID might not be exact and the finding of different scope of negative rely may come from very different aspects other than “ID difference and domain difference”.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Show some evidence about the ID estimation is correct/ exact. Rule out all other potential possibilities to affect the relation between test accuracy and intrinsic dimension and domain to finally make a strong and clear statement.

    2. Fig. 4 right is pretty, but the 3D version is hard to view which color is above another. Maybe reduce the \alpha or show the lapping information in a better way.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Even though the empirical finding has some flaws and theoretical contribution is not enough, the current finding of different scope of negative relation between test accuracy and ID is a fresh aspect to think the difference between radiological images and natural images.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This an unusual MICCAI submission in its pursuit of answering foundational questions about learning in medical imaging domain. The authors use the analysis of intrinsic manifold dimension to compare medical images and natural images in terms of their difficulty to be represented by neural nets.

    Strength: the question is fresh, and the approach is simple and easy to follow. The writing is great.

    Weakness: The formulation for ID is not new. So there is no new analytical tools developed. The authors get some credit for being the first to apply these tools, but the insights that are presented are somewhat unsurprising. For example, given the limited variety of chest x-rays in ChexPert (they all show heart and lungs with similar intensity profiles) it is not a surprise to see a low ID. And simultaneously, the difficulties in classification and generalization are also expected as the differences in disease and normal images are very fine-grained.

    My suggestion is that the authors articulate how the work can practically help practitioners in medical imaging develop more accurate AI models.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We thank the reviewers for the helpful feedback. Citation numbers below refer to the references/bibliography in our paper.

Practical Implications of Our Findings (Meta-Reviewer): Regarding the relationship of generalization ability (GA) and dataset intrinsic dimension (ID), we showed that GA (test accuracy) is linearly related to the log of the training set size Ntrain and to ID (Fig. 4). Formally, this means Acc. = alog(Ntrain) + bID + c, where (for ResNet-18) a ~= 3E-5 (Fig. 4 left), b ~= -0.02 (above Fig. 3) and c is a constant. This mathematical relationship, under the conditions evaluated in our paper, can help researchers balance desired performance with the cost of labeling new data, by solving this equation for Ntrain given the target accuracy and dataset ID. Knowing the number of annotations needed (Ntrain) would be helpful for saving time and expenses. Note that dataset ID estimation only requires raw images and no labels.

Additionally, ID provides an estimate for the information content of the data, i.e., the minimum number of degrees of freedom that can be used to accurately describe the underlying image manifold [1]. This allows for the principled choice of dimensionality d of modeled image features in unsupervised approaches, e.g., setting the dimension of the latent “noise” prior vector for GANs trained on image data to d >= data ID, as verified in [24]. Disentangled representation learning models are another use case, where the content vector dimension could be set to d >= data ID. Further experiments with radiological images would be needed to verify this.

These are just applications that we can currently think of; we expect future works to reveal more.

Finally, we hope that our study will spark more research of fundamental questions in medical image learning, to move beyond adapting methods that originated from natural image analysis. Analyzing the cause for the different test accuracy vs. ID slopes between medical and natural image datasets (Fig. 3) is a promising starting point.

Potential ID Estimator Limitations/Confounding Factors (R#2): We believe our choice of using an ID estimator that was previously validated on natural images, for radiological images to be valid for the following reasons.

  • This estimator has been validated (using GANs) on generated natural image data (Sec. 4 of [24]). This generated data has shared properties with real natural and radiological image data, such as high pixel count and visual/spatial features, which gave us confidence in using this estimator on radiological images.
  • We believe that modifying the number of image channels, i.e., color->grayscale, or rescaling/changing the normalization of the pixels, would have little effect on ID, as conceptually these should not modify the abstract spatial features which define the intrinsic information content of images that is measured by ID.

Also, we did not use auto-encoders to estimate ID because they are trained to extract features suited for reconstruction, rather than features of class identity that would be used by a supervised classifier.

Confounding Factors of the GA vs ID Relationship (R#2): The relationship between GA and ID does not appear to be dependent on normalization layers because the models without them (VGGs and SqueezeNet) resulted in GA vs. ID regressed lines similar to the models with norm. layers (Table 2, Supplementary). Additionally, to mitigate any other factors, we ran experiments over a range of models, training sizes and tasks while keeping many experimental parameters fixed. As changing these settings little affected observed trends of GA vs. ID within and between the radiological and natural image domains, we propose that the trends originate from something fundamental to the data itself, beyond any obfuscating factors.

Reproducibility (R#1): To estimate ID, we used [24]’s publicly available implementation. We will release our code for data loading, model training, etc. upon acceptance.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There is consensus. So I keep my original vote.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think this is a paper that many people at MICCAI would be interested in getting to know of.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    upper



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
    • The paper studies the so-called instrinsic manifold dimensionality of medical images (vs natural images).
    • The question is novel but the method is not. Since there is a consensus among the reviewers and the meta reviewer regarding the value of the question and good quality of writing, I vote to accep the paper.
  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    na



back to top