Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Amarachi B. Mbakwe, Lyuyang Wang, Mehdi Moradi, Ismini Lourentzou

Abstract

Chest radiography is a commonly used diagnostic imaging exam for monitoring disease progression and treatment effectiveness. While machine learning has made significant strides in tasks such as image segmentation, disease diagnosis, and automatic report generation, more intricate tasks such as disease progression monitoring remain fairly underexplored. This task presents a formidable challenge because of the complex and intricate nature of disease appearances on chest X-ray images, which makes distinguishing significant changes from irrelevant variations between images challenging. Motivated by these challenges, this work proposes CheXRelFormer, an end-to-end siamese Transformer disease progression model that takes a pair of images as input and detects whether the patient’s condition has improved, worsened, or remained unchanged. The model comprises two hierarchical Transformer encoders, a difference module that compares feature differences across images, and a final classification layer that predicts the change in the patient’s condition. Experimental results demonstrate that CheXRelFormer outperforms previous counterparts. Code is available at https://github.com/PLAN-Lab/CheXRelFormer

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_66

SharedIt: https://rdcu.be/dnwIc

Link to the code repository

https://github.com/PLAN-Lab/CheXRelFormer

Link to the dataset(s)

https://physionet.org/content/chest-imagenome/1.0.0/

https://physionet.org/content/mimic-cxr-jpg/2.0.0/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a siamese transformer model “CheXRelFormer” for disease progression classification from two chest X-ray image pairs. The model uses two hierarchical transformer feature extractors, and compare extracted feature differences before classification. The method outperforms three compared methods when evaluated on a private dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Experimental results outperforms three compared methods.
    • The approach to use vision transformers and the siamese model architecture for this particular problem (disease progression classification in chest X-ray) seems novel.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There is no motivation why the authors chose to compare to the three baseline methods, and to no other methods (what methods are SOTA in this field?). The authors claim that one of the compared methods is SOTA (CheXRelNet) but this particular method has only been evaluated on the same private datatset, which makes this claim weak.
    • The experimental setup for the three compared methods are not described at all (hyperparameters, training strategy, etc).
    • The statistical significance of the claimed improvements is not reported.
    • The dataset use for evaluation seems to be private.
    • The ablation study is not very detailed and quite non-standard. A reasonable ablation study would investigate (1) the impact of the ViT architecture, (2) the impact of the siamese architecture, (3) the impact of the learned feature “difference module” and (4) the impact using several, multi-level difference modules. Right now, it is very hard to draw any conclusions from the ablation study, “absdiff” investigates several design choices at once (learned difference module, multi-level features) while “local” investigates another kind of input format (neither mentioned earlier nor later on in the paper).
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Despite stating so in the reproducibility report, the authors do not report “the range of hyper-parameters considered” or “method to select the best hyper-parameter configuration”.
    • Details regarding compared baselines are missing (hyperparameters, training strategy, model architecture).
    • Details regarding the ablation study are missing (patch pre-processing).
    • The authors state that “A description of results with central tendency (e.g. mean) & variation (e.g. error bars).”, “An analysis of statistical significance of reported differences in performance between methods.” and “A description of the memory footprint.” is not applicable, which I do not agree with.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The authors write that “(…) y_i,m ∈ {0, 1} indicates whether the pathology m appearing in the image pair has improved or worsened or remained the same”. This does not make sense, since improved/worsened/remained the same are three different states while y_i,m is defined as binary?
    • Page 3-4 should probably be condensed to a shorter version where most of the (by now very well-known) vision transformer equations can be replaced with references to previous works.
    • The qualitative evaluation seems a bit thin at the moment. I suggest including more qualitative examples, and examples from one or several of the compared baselines as well as all setups in the ablation study.
    • The method should be evaluated on a public dataset, or the authors should motivate why not to.
    • In the ablation study, investigate (1) the impact of the ViT architecture, (2) the impact of the siamese architecture, (3) the impact of the learned feature “difference module” and (4) the impact using several, multi-level difference modules.
    • The experimental setup for the three compared methods should be described in detail (hyperparameters, training strategy, etc).
    • The statistical significance of the claimed improvements should be reported, and the results should include central tendency + variation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental evaluation is shaky (private dataset, compared methods are neither motivated nor described, no analysis of statistical significance, inferior reproducibility), which outweights the paper’s merits (seemingly novel method for this particular problem).

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Upon reviewing the authors’ rebuttal, I have reconsidered my recommendation from “reject” to “weak accept.” This change is contingent upon the authors fulfilling their commitment to significantly enhance the experimental section. This should involve providing more detailed information on baseline models, improving the ablation studies, incorporating statistical analysis, and improving the qualitative evaluation. However, despite this positive shift, I still have some concerns that prevent me from offering a stronger recommendation. These concerns include:

    • Transparency regarding access to the “Chest Imagenome” dataset: The authors should be more forthcoming about how to access this dataset. Currently, the paper only refers to the original publication without providing additional details. In their reproducibility response, the authors state that they have included “A link to a downloadable version of the dataset (if public). [Yes] “, which they have not. Further, it appears that the dataset is actually restricted and requires MIT training to access. The authors should address this discrepancy and be more transparent about the dataset’s accessibility.
    • Clarification on baseline models: In their rebuttal, the authors state, “R1 Baseline: this research area is relatively new and has fewer works. To the best of our knowledge, the compared baseline models appear to be the SOTA.” To support this claim, the authors should provide references that validate their statement. Additionally, the paper should include motivations for not considering other baseline methods, such as the methods in [11] and [12] (already cited by the paper).



Review #2

  • Please describe the contribution of the paper

    This paper proposed a transformer-based model x called CheXRelFormer, for disease/abnormality progression prediction in CXRs. The proposed model consists of two transformed based encoders and a difference model. Experiments based on CHEST IMAGENOME dataset showed the effectiveness of the proposed model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This paper proposed a transformer-based model for disease/abnormality progression, which consists of two transformers and a difference module, which can avoid to register images to monitor the difference. 2) This paper is well structured and clear to read.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) It is not clear whether the two transformers of (X,X’) in Fig.1 extract the correct region’s feature and whether the feature differences are for the same region. Using CAM or its variants to check the model feature extraction is recommended to make it more convinced.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of the paper is good. Authors provides almost all the things needed to repeat the experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) It is recommended to use CAM or its variants to check whether the two transformers extracted features for the same region or for a correct region. 2) Section 3.3, please provide more details about local, global and CheXRelNet, for example, summary the pipeline of each methods. 3) The results in Table2 compares the performances from different methods, while Local and Global perform worse. Please clarify whether these methods include registration or not. If not, how about first register the images and then use the Local/Global method? 4) regarding the title, it seems “abnormality” is more suitable than “disease”.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a framework for predicting disease/abnormality progression in CXRs, which is very useful in clinics and can be extended to many disease progression, such as cancer progression. However, some of the details are not clarified in the paper, including the state of art methods as well as the reason why the proposed model could outperform other methods.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a new Siamese model to classify patient condition progression of 9 diseases from Chest X-ray image pairs. The proposed model fuses multi-scale features from Transformer blocks and then feeds them through a combination of convolution and MLP layers to classify the condition progression. Experiments are conducted on the Chest ImaGenome dataset to compare the proposed method with CheXRelNet, a state-of-the-art method for progression classification. The proposed method CheXRelFormer outperforms CheXRelNet in 6 out of 9 disease categories.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper proposed a simple straightforward Siamese model for disease condition progression and the results demonstrate it is effective and outperforms the state-of-the-art method CheXRelNet.

    • The analysis for experiment results are informative, by comparing variants of the CheXRelNet and the proposed method. It provides more insights into how the model makes predictions using long-range and global features.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Although CheXRelFormer outperforms CheXRelNet overall and in 6 out of 9 diseases, why isn’t it outperforming in the remaining 3? Some analysis of this is needed.

    • How are the ROIs defined in CheXRelFormer_Local defined?

    • Why does the CheXRelFormer_AbsDiff have fewer parameters? Could this be the reason it’s underperforming?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible, the paper included technical details needed for re-implementation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • More introduction of the dataset is needed, such as how long is the second X-ray image obtained after the first one, what is the dataset distribution for different progression (no change, worsened, improved).

    • Many building blocks of the proposed method are standard modules for vision models. There is no need to write so many equations.

    • In Fig 2, some visual pointers to where the models focus for prediction would be useful. Otherwise, showing the image pairs doesn’t provide much information.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall this paper made the contribution to advance the progress of an underexplored sub-field, disease progression with AI models. The proposed method proves to be more accurate than the state-of-the-art. There are some minor issues of this paper which can be addressed.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received three reviews with two weak accept and one reject. From the detailed comments, this is a paper with mixed opinions. So we would invite authors for rebuttal.




Author Feedback

We thank reviewers and ACs for their valuable constructive feedback, and acknowledgment of the paper’s novelty, clarity, and methodological suitability for the task at hand.

R1, R2, R3 Qualitative evaluation was performed using attention rollout. Produced attention maps clearly showed the model’s focus regions, confirming that the model concentrated on the correct region in the image pairs. Results will be added in the final draft.

R1, R2 Baseline setup: The Local model is a siamese network with pretrained ResNet101 autoencoder trained on cropped ROIs, Global is a siamese network similar to Local but trained on the entire image while ChexRelNet is a 2-layer graph neural network with ResNet101 autoencoder (see Section 3.3 and [9]). All code (preprocessing, model, baselines, etc.) will be released and more details about baseline setup will be included in camera-ready.

R1 Statistical analysis: Accuracy and standard deviation (SD) over 3 trials CheXRelFormer (0.493+/-0.0012) and ChexRelNet (0.468+/-0.0041). The one-tailed t-test between ChexRelFormer and ChexRelNet accuracies (over 3 trials) pval=0.00027 demonstrates the significant performance of our proposed CheXRelFormer model over ChexRelNet. The SD and t-test results are evidence the proposed method is effective and outperforms baselines. We will add these to the final draft.

R1 Dataset is a publicly available dataset (Chest Imagenome [23] and MIMIC-CXR [7]). We reiterate no private data have been used. Data reconstruction code will be made public.

R1 Ablation studies: we ran new ablations replacing the ViT architecture with a CNN architecture (with difference modules), which achieves 0.45 accuracy with 52M params while our model’s accuracy is 0.49 with 41M params. Some of the suggested ablation studies have already been conducted and can be found in supplementary. Specifically, our paper includes Chexrelformer_Absdiff to examine the effect of the difference module, ChexRelFormer_Local to investigate the impact of global image input, and notably Chexrelformer_LastBLK and Chexrelformer_AbsdiffL (included in supplementary) to assess multi-level features. We will provide more detailed explanations in the final draft.

R1 Baseline: this research area is relatively new and has fewer works. To the best of our knowledge, the compared baseline models appear to be the SOTA.

R1 Statement: ‘The authors state that “A description of results with central tendency (e.g. mean) & variation (e.g. error bars).”, “An analysis of statistical significance of reported differences in performance between methods.” and “A description of the memory footprint.” is a misrepresentation as our paper has no such statements. We agree that statistical significance is important. We have reported them and will include more details in the final draft.

R1 ‘(…) y_i,m ∈ {0, 1}’ is a typo error. It was meant to be (…) y_i,m ∈ {0, 1, 2}. We will correct this in the final draft.

R3 Dataset: we will add more dataset statistics to the camera-ready version. The distribution is improved: 12396, worsened: 12287, no change: 11205.

R3 Underperforming diseases: analysis using a confusion matrix showed that for enlarged cardiac silhouette (ECS), 65% of improved were classified correctly, 60% of worsened were classified correctly, while only 31% of no change were classified correctly. In the case of ECS, detecting no change is challenging due to the lack of spatial registration, which could result in slightly different geometry/size between images regardless of disease progression. Pneumothorax is a consistently difficult condition due to the fine-grained nature of its visual signature.

R3 ROIs in CheXRelFormer_Local are defined based on the ROIs bbox in the dataset. The model was trained on cropped ROIs.

R2 Registration: Accurate registration of 2D projection X-rays is difficult and could introduce errors which are hard to measure. A solution without spatial registration is highly desirable.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After rebuttal, all three reviewers agree to accept this work. The problem that authors targeted is important and sufficiently novel as well.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After rebuttal, all three reviewers support for the acceptance of this paper. Along with my reading of the paper and rebuttal, a decision of accept is recommended according to the overall quality of the paper. However, I hope that the authors can revise the paper per the reviewers’ suggestions in the official version.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors provided a good rebuttal, and one of the reviewers increased the score. As a result, the final score became among the ones on the higher-side in my pool.



back to top