Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jinghan Sun, Dong Wei, Liansheng Wang, Yefeng Zheng

Abstract

Medical images are widely used in clinical practice for diagnosis. Automatically generating interpretable medical reports can reduce radiologists’ burden and facilitate timely care. However, most existing approaches to automatic report generation require sufficient labeled data for training. In addition, the learned model can only generate reports for the training classes, lacking the ability to adapt to previously unseen novel diseases. To this end, we proposed a lesion guided explainable few weak-shot medical report generation framework that learns correlation between seen and novel classes through visual and semantic feature alignment, aiming to generate medical reports for diseases not observed in training. It integrates a lesion-centric feature extractor and a Transformer-based report generation module. Concretely, the lesion-centric feature extractor detects the abnormal regions and learns correlations between seen and novel classes with multi-view (visual and lexical) embeddings. Then, the detected regions and corresponding embeddings were concatenated as input to the report generation module for explainable report generation including text descriptions and corresponding abnormal regions detected in the images. We conduct experiments on FFA-IR, a dataset providing explainable annotations, showing that our framework outperforms others on report generation for novel diseases.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_59

SharedIt: https://rdcu.be/cVRzd

Link to the code repository

https://github.com/jinghanSunn/Few-weak-shot-RG

Link to the dataset(s)

https://physionet.org/content/ffa-ir-medical-report/1.0.0/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a multi-view lesion guided few weak-shot learning for explainable medical report generation, which is the first model in such a complex task. Different from existing report generation frameworks that use the global features of images for the generation, this method uses regional lesion features by jointly learning with a lesion region detection task. for weak-shot learning, this method builds a soft label for exploiting the semantic relationship between seen and novel diseases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Authors aim to tackle a novel task which has certain clinical significance. The method of this article is innovative enough. Experiments show the proposed method can achieve higher performance than the reference methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Poor readability. Although it is not a technically difficult paper to read, the content of the paper is incoherent and often needs to be searched up and down. For example, in Eq, 2, the explanation of “y” need be traced back to the section “Problem Setting”. It causes great trouble in understanding important formulas. (2) Some mistakes appear in formulas. In Eq. 2, it sees “i” has many possible values. Also, I’m not sure if this soft label “L^s” is one-hot coding, which need all elements sum to one. The explanation of the soft label is confusing. (3) What is the theoretical rationality of “using KL-divergence can can force the network to learn the relationship between seen and novel diseases”. (4) I understand that the soft label establishes the relationship between seen and novel diseases, but why not just using the cross-entropy loss since you can straightly using the softmax to get \hat_{y}^s. (5) In the section “Few Weak-Shot Report Generation for Novel Diseases”, it seems that the model has complex operations in the inference stage, which are not shown in Fig. 2(overview of the proposed approach). (6) In comparison experiments, since the method is not the best in all metrics, author should give reasonable explanations. Also, the conclusion “Our method outperformed other state-of-art approaches to medical report (7) generation in six experimental settings” is incorrect. (8) In Fig. 3, why not showing the results of the compared method Grounded. (9) Authors do not discuss the generality of the proposed method, e.g., data conditions necessary to use this method, generality of proposed modules. (10) Since in the “Problem Setting”, we see “Each case in the dataset contains images of a patient at different periods”. Is this a necessary condition? The authors did not provide discussions about parameters “N”. (11) Also, is there a limitation for the type of provided weak annotations?

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper lacks parameter details, so it is difficult to reproduce only by the paper. The author states that the code will be published in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Too long introduction is unnecessary, since ZSL is not highly relevant to this paper. Problem setting is unclear. Some symbols lack corresponding explanations and some symbols may be reused. Authors need to check them carefully. Fig. 2 does not show weight imprinting scheme and is lack of explanation of virtual, solid lines and color. There is a lack of discussion of some important parameters.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This article may be innovative enough, but it has poor writing quality and insufficient experiments.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposes to leverage the few weak-shot learning for medical report generation. The few short setting lies in potential unseen lesion types in the testing cases. The report generation is coupled with the lesion detection task and uses the detection results for guidance. To better improve the model, the authors further propose a soft target of lexical embeddings training. Lexical features and visual features are combined before being fed to the generative model. Experiments on the Fundus Fluorescein Angiography Images and Reports dataset (FFA-IR) demonstrated the superiority of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper is well organized by putting together all technical components in a systematic way. Connections and interactions across different components are clearly illustrated. Below are the detailed strengths of the paper:

    1, the few short setting is novel while making sense. From a system evolution perspective, it is highly possible that new classes are gradually added to the dataset. For a smooth system transition or evolution, the few short learning framework should provide great help and makes the solution more practical.

    2, lexical embedding bridges different modality and strengthen model training in a multi-task manner. This embedding also enables a good easy cold start for future new classes and makes the efforts for adapting models to new classes minimum. This is achieved by enforcing a soft label that is calculated by using textual similarity for the class branch of Faster RCNN with the RoI features.

    3, when generating the report visual features and lexical embedding are combined to guide the generation process. The conceptually can better gears the report to a more lesion-focused fashion.

    4, extensive experiments and promising results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I have the following questions:

    1, does the few shot setting also help the lesion detection task? I understand the main focus here is the text generation part. But I am still interested to see how detection performance is affected compared with the baseline Faster RCNN.

    2, during inference, how are new classes classified? It is unclear to me that when the Q set images are fed in, how the predictions from the seen classes and novel classes will be combined. It seems like new classes are separated from the seen classes. Could you specify what the final prediction will be like from the detection perspective? Or simply, we do not care about the class label?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Positive if the code is release and data partition is provided

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please address my questions listed above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The few weak-shot setting is interesting. It provide an potential solution for smooth evolution of the machine learning system that requires expensive annotations.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper uses weak ZSL to transfer knowledge from a model trained on seen to unseen lesions/classes using images and pixels. It is a novel approach to reduce human annotation requirement for retraining models on new diseases using multi view embeddings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a novel approach for reducing annotation requirements by using common global findings between seen and unseen classes. The promise of the methodology is seen in comparison with SOTA models and ablation studies.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the model allows use of common features between seen and unseen classes, it is unclear how big does the overlap of these features is necessary.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors provided a checklist for reproducibility. The description of the methods, implementation and validation studies are also satisfactory.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Please comment on the implementation of this methodology between two classes where feature overlap is not significant.
    2. The BLEU scores of the the new methods outperforms other ablation methods but is still moderate when looking at the absolute value. Are there strategies to improve the absolute score?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novel approach but needs demonstration between two substantially different classes.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers liked the proposed idea using few weak-shot learning for data-efficient medical report generation but were concerned about readability, some possible mistakes in formula, and lack of clarity in certain parts of the paper.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

We thank the reviewers for appreciating our work (especially its novelty) and the constructive comments.

Q1: Readability, clarity & rigor (R1) We will improve the manuscript thoroughly, explain symbols near formulae, eliminate reused symbols, and carefully proof formulae.

Q2: Explanation of soft label L^s (R1) We redefine the soft label for class y by l_c^s=Sim(W_y^e, W_c^e), replacing the subscript “i” with “y”. Intuitively, l_c^s measures cosine similarity of class y to class c in semantic space, and L^s collects class y’s similarity to all classes, both seen and novel, thus establishing correlations between them. Elements of L^s do not necessarily sum to one.

Q3: Rationality of using KL divergence for seen-novel relationship learning; why not cross-entropy loss (R1) By aligning visual features with lexical embeddings via KL divergence on the soft label (see Q2), the visual feature extractor is forced to learn the relationship from the semantic space. Cross entropy is functionally similar in theory, with empirically worse outcome.

Q4: Figs. 2 & 3 (R1) We will add inference operations and line & color legend to Fig. 2, and result of Grounded to Fig. 3.

Q5: Explain non-best metrics (R1) Our method was not the best for some METEOR and ROUGE scores (Table 1). For METEOR, we conjecture the discrepancy between the FFA-IR vocabulary and WordNet impairs its reliability here. Also, our method utilizes inter-class relationship to produce semantically correct reports, yet ROUGE measures word correspondence rather than semantic correctness.

Q6: Problem setting & generality (R1, R3); comment on two substantially different classes (R3) A prerequisite for our method is that seen classes are diverse and have sufficient amount of data, to ensure commonalities with novel classes. In the clinical context, the seen classes are abundant common diseases and novel ones can be rare or newly-added diseases. While it is hard for our method to work in the extreme case of only two completely different classes, the prerequisite can be easily satisfied in the intended use scenario. It would be interesting to quantify how big the overlap is necessary in future work. Besides, N is not a parameter but a property of the data. Our method can also generalize for non-periodical data, e.g., N views in a chest X-ray. Lastly, our proposed architecture has the potential to use other weak annotations. In this work, we used bounding boxes for proof of concept.

Q7: Parameter details (R1) & data partition (R2) We will elaborate parameter settings and release our code and data partition. We will also add a discussion of the few-shot number K, e.g., experiments show that larger K’s yield better results, and that with K=1, our method still outperforms Grounded and R2Gen in all 7 metrics by 0.004-0.116 for Test-Novel (32/5/9) setting.

Q8: If few-shot setting helps detection (R2) Yes. With our coupled few-shot setting and semantic prediction, the Recall@100 increases by 15% upon a vanilla Faster R-CNN trained on the support set.

Q9: New disease classification for inference (R2) Via weight imprinting [20], classifier weights for novel classes are computed by averaging normalized visual features of lesion regions in support images, and used along the classifiers of seen classes for combined classification of both seen and novel classes. Then, for an image in set Q, its visual prediction is given by the classifiers above, semantic prediction by softmax over similarity to lexical embeddings (both seen and novel; below Eq. 2), and final prediction by averaging the visual and semantic predictions.

Q10: BLEU scores (R3) Compared to the fully supervised SOTA approaches (incl. [4,28]) on the same dataset [15], our B1 & B2 scores were higher, B3 scores were comparable, and B4 scores had room for improvement, despite that our few weak-shot setting was more challenging. In the future, we plan to employ more advanced few-shot detection methods to improve absolute scores.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper received one accept, one weak accept and one weak reject recommendations. The main concerns by reviewers are related to readability, lack of clarity and details. All reviewers commented on the novelty of the approach and they are very interested in the problem being tackled. The rebuttal submitted by the authors is good. The authors promised to correct all minor mistakes, add definitions for symbols and clarifying details. Questions raised by Reviewer 1 and Reviewer 2 are answered satisfactorily. Reviewer 3 (recommending weak reject) raised a question that did not really get answered in the rebuttal but that question is very hard and the whole review is brief. Considering the novelty of the approach and the authors’ intention to make minor changes to address reviewer concerns, the paper should be acceptable.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a multi-view lesion guided few weak-shot learning for explainable medical report generation. Following my reading of the paper, reviews, and rebuttal, it seems the authors have addressed most of the concerns. Recommend to accept and ask the authors to reflect the rebuttal points in the paper if finally accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a report generation method from images, with the lesion detection. Most of the reviews are positive. The meta-reviewer recommends acceptance. There are a few evaluation metrics used in the paper. It might be interesting to think of a few more criteria/questions for the related research community: will human tell which report is generated by a computer or written by a doctor? If the report is generated by a computer, will the patient be willing to trust it?

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top