Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mayur Mallya, Ghassan Hamarneh

Abstract

Medical imaging is a cornerstone of therapy and diagnosis in modern medicine. However, the choice of imaging modality for a particular theranostic task typically involves trade-offs between the feasibility of using a particular modality (e.g., short wait times, low cost, fast acquisition, reduced radiation/invasiveness) and the expected performance on a clinical task (e.g., diagnostic accuracy, efficacy of treatment planning and guidance). In this work, we aim to apply the knowledge learnt from the less feasible but better-performing (superior) modality to guide the utilization of the more-feasible yet under-performing (inferior) modality and steer it towards improved performance. We focus on the application of deep learning for image-based diagnosis. We develop a light-weight guidance model that leverages the latent representation learned from the superior modality, when training a model that consumes only the inferior modality. We examine the advantages of our method in the context of two clinical applications: multi-task skin lesion classification from clinical and dermoscopic images and brain tumor classification from multi-sequence magnetic resonance imaging (MRI) and histopathology images. For both these scenarios we show a boost in diagnostic performance of the inferior modality without requiring the superior modality. Furthermore, in the case of brain tumor classification, our method outperforms the model trained on the superior modality while being comparable to the model that uses both modalities during inference.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_29

SharedIt: https://rdcu.be/cVRU9

Link to the code repository

https://github.com/mayurmallya/DeepGuide

Link to the dataset(s)

  1. Derm7pt: https://github.com/jeremykawahara/derm7pt

  2. RadPath: https://miccai.westus2.cloudapp.azure.com/competitions/1


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a student-teacher method that distills knowledge learned from a better-performing (superior) modality to guide a more-feasible yet under-performing (inferior) modality and steer it towards improved performance in two clinical tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper trained a guidance model to learn the latent representation from inferior modality to superior modality, and the classification performances slightly improved in two different tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The improvement of the method is not significant, which is consistent with the intuitive expectation. In my opinion, the paper is a type of feature-level fusion work, which is common in the domain.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    the reproducibility is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. to compare the model performance difference with different input features, the statistical analysis should be used. I am not sure whether the displayed difference is significant in statistics. 2.the model performance didn’t work better than direct fusion of two imaging modalities, which made the work not important enough.
    2. multi-site external dataset (if exist) should be used to testify the generalization.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the model performance and the methodological novelty

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a method to guide a neural network with an “inferior” input modality based on a network trained on a superior input. The method is tested on two datasets with two different target tasks and demonstrate, that their method is even able to outperform the model trained on the superior modality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors present a method with ample novelty and clinical relevance. The experimental design is well motivated and described such that other researchers should be able to reproduce the results. The proposed method is benchmarked against sound baselines and findings supported by multiple well-justified metrics on two datasets centered on two different tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No major weaknesses to report.

    Comments:

    • You evaluated the performance on multiple splits. Please consider also mentioning the standard deviation between the rungs to allow estimating the robustness of the results (if the available space is too tight, possibly in the supplementary materials alongside the boxplots).
    • Consider highlighting the best performing methods in the tables for easier interpretation
    • Although widely know, consider introducing abbreviations like MRI and MSE at their first use
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors use publicly available datasets and provide details about the techniques used, such that I feel confident I would be able to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is concisely written and nicely guided through the text. The related work is introduced in an appropriate length for such a conference paper. The figure and tables support the findings and contribute to the understanding.

    Please consider the comments listed under 5.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper includes both technical novelty and clinical relevance. Together with the careful preparation of the paper and the good reproducibility, I propose to accept this contribution.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper describes a multimodal / student-teacher learning approach for training classifiers to do a task based on inferior (cheaper, more accessible, more convenient, etc) medical data vs. superior (expensive, exotic, invasive, etc) medical data, and transferring information from the superior to inferior classifier to improve the performance of the inferior classifier.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength is the importance of the problem setting, which is very widespread in medicine (i.e., the desire to use inferior modalities to do jobs that are well performed by superior modalities).

    Another strength is the evaluation, which very carefully teases out what exactly is added to task performance by superior data, inferior data that has been transformed towards superior data, and raw inferior data. This evaluation is done on two entirely different medical data classification domains– another strength.

    Clarity of writing is an additional strength. It is difficult to get confused about how this method works or why it might be thought to work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The novelty is not 100% clear. We are told that references 11, 16, 20, and 30 do a highly similar thing to this paper… but there is no explicit statement about whether the current paper consists 100% of applying a useful technique from another domain to this domain; or whether there are novel methodological aspects.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The data sets are publicly available. The code foundation is publicly available (PyTorch, TensorFlow, etc). The implementation description is provided in some detail. I believe reproducibility is high.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    “Consequently, it would be advantageous to leverage the inferior modalities in order to alleviate the need for the superior one. However, this is reasonable only when the former can be as informative as the latter.” Luckily for the authors, this assertion is not true. If it were true, the authors would be in trouble because their own data shows that inferior is not as informative as superior. The problem with the assertion is that there are many use cases when inferior is not expected to replace superior 100% of the time; let’s say both modalities are locally available, but we are trying to reduce usage of the superior modality due to cost, burden, or some other reason. Using the inferior modality as a cheap screening tool that rules out, say, cancer without the superior modality most of the time (say in 90% of cases), with the other 10% of cases requiring the superior modality, would still make a large positive impact in many application domains. This kind of cheap screening problem setting is not really considered here.

    The description of the tradeoffs associated with image-to-image translators is mostly OK. It’s true that they don’t deal with image-to-non-image translation very well or differences in dimensionality etc. But an additional stated weakness is essentially “what if it doesn’t do image-to-image translation very well?” That’s a kind of non-informative critique– you could say “what if it doesn’t work” about any hypothetical method. The upside of image-to-image translators– that you can use the translated image for a multitude of different tasks, not just the one you did inferior-to-superior translation for– is not mentioned. The current method trades off that flexibility with perhaps superior performance on a more limited job (i.e. the classification task).

    To me, “guidance” is confusing. The term has an active connotation, akin to active learners that are provided novel training examples optimally selected for them by an oracle as “guidance” to improve their representations. Really, using more traditional terminology, the inferior and superior classifiers includ feature selectors that reduce the dimension of the input data in a way that is beneficial for classification; what you are calling “guidance” is really feature translation or transformation.

    Table 1 is very difficult to follow, with notation that is only partially explained in the caption. Where the rest of the notation is described in the text, it’s in code, e.g. we need to translate from “G(I)+I” in the text to “G(R)+R” in the table. Make it easier for the reader by writing out the row names in the table in English.

    The discussion seems to take it as given or non-surprising that the inferior modality adds value on top of the superior modality in both applications. How do we know that there is really independent information in the inferior modality, as opposed to just poor guidance / feature transformation from inferior to superior?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main reasons for the strong accept are the compelling clinical application, seemingly very clear and straightforward approach, and rigorous multi-faceted evaluation. The question about novelty compared to the prior student-teacher methods is the only factor pulling it down in my view.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work presents a light-weight guidance model that leverages the latent representation learned from the superior modality, when training a model that consumes only the inferior modality. The proposed approach is used for integrating MRI and pathology studies for two clinical use-cases. The concept is innovative and well described. Reviewer 1 has raised some concerns regarding model evaluation and statistical testing that could be addressed in the final submission.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

First and foremost, we would like to thank the reviewers for taking their time to review our paper and providing constructive feedback. We are glad that reviewers 2 (R2) and 3 (R3) appreciated the contribution of our work with strong accepts. In this response, we mainly focus on reviewer 1’s (R1) comments and try to address some of the misinterpretations.

1) R1 mentioned that the method is a typical feature-level fusion work. To clarify, yes, we agree it is a feature-level fusion work that is common in the multimodal classification area, but the contribution of our work is that we are trying to achieve with inferior modality what other works achieve with both inferior and superior modalities. Our proposed model, during inference time, uses only the inferior modality inputs while leveraging the knowledge of the superior modality. From our experiments, we show that the superior modality teacher network improves the performance achievable by the inferior modality student network.

2) R1 mentioned that the proposed method “didn’t work better than direct fusion of two imaging modalities”. We would like to clarify again that the contribution of our work is to leverage the existing multimodal data to improve the performance of only the inferior modality during inference time. The goal of our work is not to outperform the method that uses both superior and inferior modalities at inference. In fact, we believe that the performance of the multimodal model forms an upper bound on the performance achievable by our proposed model that only takes inferior modality inputs at inference time.

3) Suggestions from R1 included using statistical significance to validate the improvements and testing on multi-site data. We agree with R1 that a statistical analysis can help underscore the performance improvements of the proposed method. We shall compute the statistical significance of the improvements over the baseline method and add the results to the paper. Secondly, with respect to the multi-site testing, we agree that the contribution of our work would be stronger if we could use the multimodal dataset acquired from Hospital A to guide the inferior modality acquired from Hospital B. As we mention in the conclusion section of the paper, we believe that handling such distribution shifts is an important direction for future work in this domain.

4) R3 mentioned the novelty is “not 100% clear”. In general, cross-modal S-T learning works on medical image segmentation, such as [11], [16], and [20], rely on the modalities being registered. This enables them to leverage the anatomical structure that is more evident in the teacher network (used as shape priors, for example) to improve the segmentation of the student network. [30], on the other hand, involves a classification task where the modalities do not necessarily have to be registered, which makes it the closest related work. As mentioned in the paper, they use the language modality to improve an imaging modality. Similar to the segmentation works listed above, their method encourages the student to only mimic the latent distribution of the teacher modality, without adding its own knowledge to the final prediction, while our student network, in addition to mimicking the teacher, adds its own knowledge, making our work an S-T learning-based multimodal classification.

5) R2 and R3 provided minor (but useful) comments on improving the readability of the paper. We are grateful for the comments and we shall incorporate the suggested changes. Special thanks to R3 for the insightful and interesting discussion. In one of the discussions, R3 mentioned that a cheap screening problem is not considered. We would like to clarify that our method is applicable in such a setting too as it improves the performance of the cheaper modality, for example, from 90% to 92% in the discussed scenario, and alleviates the need for superior modality (in a DL setup).



back to top