Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, C.-C. Jay Kuo, Aichi Chien, Kai-Wei Chang

Abstract

Vision-and-language (V&L) models take image and text as input and learn to capture the associations between them. These models can potentially deal with the tasks that involve understanding medical images along with their associated text. However, applying V&L models in the medical domain is challenging due to the expensiveness of data annotations and the requirements of domain knowledge. In this paper, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT for better capturing the associations between clinical notes and medical images.

Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art while it is trained on a 9× smaller dataset.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_69

SharedIt: https://rdcu.be/cVRzn

Link to the code repository

https://github.com/monajati/BERTHop

Link to the dataset(s)

https://openi.nlm.nih.gov/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a transformer-based vision & language (V&L) model based on a recent image feature learning method named, PixelHop++ and BlueBERT which has been trained on biomedical and clinical datasets. The results show that the proposed method can better capture the associations between clinical notes and medical images, gaining higher classification results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This research shows the feasibility of performing V&L analysis and disease diagnosis on small medical datasets without labels.
    2. Various comparative experimental analysis shows the effectiveness of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the results show the effectiveness of using PixelHop++, I was not able to clearly understand why the use of PixelHop++ is improving the learning outcome.
    2. The description of PixelHop++ needs to be articulated - what do you mean by image at different frequencies? why does this improve the result?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It looks it is possible to reproduce of the proposed method as it is based on the combination of existing methods.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. While the results show the effectiveness of using PixelHop++, I was not able to clearly understand why the use of PixelHop++ is improving the learning outcome.
    2. The description of PixelHop++ needs to be articulated - what do you mean by image at different frequencies? why does this improve the result?
    3. The writing of paper needs to be improved; there are many grammatical errors and repetitions throughout
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an important research topic. The experiment presented in the paper is good but the description/contribution of proposed method is not well articulated.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Thank the authors for clarifying the motivation of using PixelHop++. The use of PixelHop++ is now better justified for x-ray image analysis. I have changed my decision to ‘weak reject’.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a Vision and Language model for better capturing the associations between clinical notes and medical images. In particular, PixelHop++ is used to extract features from images, that are then processed with a visual transformer together with language embedding, extracted from a text report. The results demonstrate the validity of the method

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and easy to follow. Dataset and training procedure are well presented. The effectiveness of the choices made is demonstrated by testing different feature extractors and different transformer backbones.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • How PCA is applied to PixelHop++ channels?
    • The pre-processing should be explained, despite only say that is the same as TieNet
    • Trying the method also on other datasets could confirm even more its effectiveness.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is public and the hyper-parameters have been listed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Some aspect could be better explained

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and easy to follow, and a good evaluation of the architecture has been conducted.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper proposes BERTHop, a transformer-based Vision and Language model that is applied to medical images. The visual encoder of the V&L architecture for BERTHop is implemented with PixelHop++. This is unsupervised and reduces the dependencies of labeled data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    BERTHop implements PixelHop++ for unsupervised visual feature learning for medical images. This helps specifically when labeled data is not available which makes it a perfect use case for medical tasks.

    The paper shows detailed disease-wise labels and compared the results with two baselines. The improvement in AUC is significant. Also, the comparison of different transformer backbones is shown.

    The argument for replacing BUTD as it fails to detect certain medical abnormalities is supported with convincing results in Table 1.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although the results are supporting the claims of the paper, the reviewer is concerned about the generalizability of the proposed architecture on other datasets. Also, the reviewer is interested to understand the change in results when the degree of unlabelled data is changed. Implying what will be the disease-wise AUC when 10%, 20%, 30%, and so on amount of the data is labeled. This is because a mixture of labeled and unlabelled is common in medical datasets. Some areas of the paper were gramatically difficult to be followed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors did not declare whether code will be made public. The individual model components are publicly available making the proposed architecture reproducible. The dataset used is publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The simplicity of the architecture cannot be countered with a lack of novelty perspective. Also, detailed comparisons are reported with different backbones. Please clarify whether the method is an extrapolation of the concept of visual Q/A, and if the authors have tested it previously on non–medical datasets.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed architecture is simple and intuitive. The results are convincing and strongly support the arguments presented.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper reports use recent language and image learning models for learning of disease labels in chest x-ray. There are several points that the authors need to carefully consider in their response: 1) Justification for the choice of PixelHop 2) What is meant by image frequencies, is “scale” what the authors mean? 3) issues with writing as highlighted in the reviews need to be addressed

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We thank all reviewers and the area chair for their valuable comments.

Justification for the choice of PixelHop++: PixelHop++ is an unsupervised feature learner that is shown in the literature to be highly effective when no/few labeled data is available. In our preliminary experiments, we found that this visual representation approach is suitable for modeling X-ray images and it can capture various levels of details in an unsupervised manner. In the training process, PixelHop++ uses PCA to learn axes which maximize the variations of the date the most, i.e., it aims to preserve the distinction between input patches which are likely to be related to abnormal regions in X-ray images. We map image patches on these learned axes to extract discriminant features from them. While requiring no labeled data due to its unsupervised nature, PixelHop++ plays an important role by efficiently offering an image representation which also preserves the information of abnormal regions.

Detailed Description of PixelHop++: we will include more details about PixelHop++ in the revision. Suppose that we have N training images of size s1 * s2 * d, where d is 1 for gray-scale and 3 for color images. They are all fed into a single “PixelHop++ unit” in the first level of the “PixelHop++ model” (Please note that there are one or more PixelHop++ units in each level of a PixelHop++ model). The goal of training a PixelHop++ unit is to compute linearly independent projection vectors (kernels) which can extract strong features from its input data. In the first step of processing data in a PixelHop++ unit, using a sliding window of size w* w*d and a stride of s, patches from each training image are extracted and flattened, i.e., xi1, xi2, …, xiM where xij is the jth flattened patch for the image i and M is the number of extracted patches per image. In the second step, the set of all patches extracted from training images is used to compute the kernels of the PixelHop++ unit. Kernels are computed as follows:

  1. The first kernel, called the DC kernel, is the mean filter that extracts the mean of each input vector.
  2. After computing the mean (DC component) of each vector, PCA kernels of the residuals are computed and stored as AC kernels. The first k PCA kernels are the top k orthogonal projection vectors that can capture the variation of residuals the best. Each image patch is then projected on computed kernels and a scalar bias is added to the projection result. By transforming xi1, xi2, …, xiM by a kernel in a PixelHop++ unit, one output channel is generated. For example, in the first level of the model, the PixelHop++ unit generates 1 DC channel and ww(d – 1) AC channels.

In the last step, model pruning is executed to remove the channels which include deficient data. The ratio of the variance explained by each kernel to the variance of training data is called the “energy ratio” of the kernel or its corresponding channel and is used as a criterion for pruning the model. An energy ratio threshold value, E, is selected and model pruning is performed using the following rule:

  1. If the energy ratio of a channel is less than E, it will be discarded as the variation of data along the corresponding kernel is very small.
  2. If it is more than E, it is forwarded to the next level for further energy compaction.

Each output intermediate channel generated by a PixelHop++ unit will be fed into one separate PixelHop++ unit in the next level. So, except for the first level of the model, other levels contain more than one PixelHop++ unit. You may find a more detailed explanation in the paper titled “PixelHop++: A Small Successive-Subspace-Learning-Based (SSL-based) Model for Image Classification”

Clarification on Image representation at different frequencies: After computing AC kernels, we map patches on the computed kernels and pick top k projection vectors. These vectors represent image variations at different frequencies from highest to lowest.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal had a positive impact on the reviews. My original summary was addressed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors provided a rebuttal that addressed the reviewers’ comments. After the rebuttal, they unanimously agree that the paper is a good contribution and recommend acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After rebuttal, all three reviewers reached accept conclusion. AC also agree on their arguments and comments. This paper has good technical novelty and several interesting ideas that is very useful for the literature on developing this line of work. Please aware that OpenI is an ideal but very limited dataset from the comprehensiveness in the real world scenario on disease diagnosis in Chest X-ray images.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



back to top