Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qiangguo Jin, Changjiang Zou, Hui Cui, Changming Sun, Shu-Wei Huang, Yi-Jie Kuo, Ping Xuan, Leilei Cao, Ran Su, Leyi Wei, Henry B. L. Duh, Yu-Pin Chen

Abstract

Sarcopenia is a condition of age-associated muscle degeneration that shortens the life expectancy in those it affects, compared to individuals with normal muscle strength. Accurate screening for sarcopenia is a key process of clinical diagnosis and therapy. In this work, we propose a novel multi-modality contrastive learning (MM-CL) based method that combines hip X-ray images and clinical parameters for sarcopenia screening. Our method captures the long-range information with Non-local CAM Enhancement, explores the correlations in visual-text features via Visual-text Feature Fusion, and improves the model’s feature representation ability through Auxiliary contrastive representation. Furthermore, we establish a large in-house dataset with 1,176 patients to validate the effectiveness of multi-modality based methods. Significant performances with an AUC of 84.64%, ACC of 79.93%, F1 of 74.88%, SEN of 72.06%, SPC of 86.06%, and PRE of 78.44%, show that our method outperforms other single-modality and multi-modality based methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_9

SharedIt: https://rdcu.be/dnwJr

Link to the code repository

https://github.com/qgking/MM-CL.git

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    The paper presents a multi-modality contrastive learning model that integrates hip X-ray and clinical information for diagnosing sarcopenia. The proposed method utilizes a non-local CAM enhancement technique to capture global long-range information and direct the network’s attention towards important regions generated by CAM. It also employs visual-text fusion to enhance the multi-modality feature representation and auxiliary contrastive representation to improve the discriminative representation ability of the model. The study employs a dataset from a hospital comprising 1,176 patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed model leverages both hip X-ray images and clinical information to diagnose sarcopenia.

    The authors introduce a multi-modality contrastive learning approach, which includes a Non-local CAM Enhancement module, a visual-text feature fusion module, and an auxiliary contrastive representation.

    The proposed approach achieves superior performance compared to the baseline models.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I couldn’t find any major weakness.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be reproducible since it is clearly written, and the authors have shown the implementation details. Also, the authors intend to release the code. The authors did not mention if the dataset used for evaluation will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    It will be helpful to release the anonymized dataset.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The technical novelty, reproducibility and results achieved.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper is well-written, and the contribution can be seen in the conclusion part. They propose a multi-modality contrastive learning model for sarcopenia screening using hip X-ray images and clinical information. The proposed model consists of a Non-local CAM Enhancement module, a Visual-text Feature Fusion module, and an Auxiliary contrastive representation for improving the feature representation ability of the network. Moreover, we collect a large dataset for screening sarcopenia from heterogeneous data. Comprehensive experiments and explanations demonstrate the superiority of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    – clarity. The paper clearly show the task background, challenges, existing methods, proposed workflow, experiments, and conclusion. Through reading the paper, readers could get general idea of challenges in identify sarcopenia from hip x-ray images, drawbacks of existing papers. The general workflow and components are also easy to understand. – comprehensiveness. The full configuration of the proposed model involves many components, in charge of different modality processing and feature fusion. The experiments include less complex models, for performance comparison. They also implement many state-of-the-art models, with varying modality input and working logics. The t-SNE figure could convince readers about the effects of components. – novel application. They discuss the sarcopenia definition, challenges, and make some assumption (muscle characteristics of sarcopenia, measuring hip-x-ray, related clinical data, possible ways to extract useful information in hip x-ray). The utilization of Non-local CAM enhancement, Visual Text Feature Fusion, Auxiliary Contrast Representation are appropriate and adequate. These components have reasonable capacity for the sarcopenia diagnosis through hip x-ray and clinical parameters.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    – data distribution. They curate a large dataset, consisting of 490 positive and 686 negative cases. If the application is for Sarcopenia screening, then the data in the experiment could not represent the general population since only 10% over the age of 60 will develop sarcopenia. – lack of detailed ablation. The data curation and modeling process are pretty clear and logical, and the experiments are well carried out. But I want to learn more about appearances of sarcopenia cases, such as the typical outlook, distinguishing characteristics in the x-ray, ambiguity cases. For ablation study purposes, the author may add some failure case analysis, explaining the possibility for improvement and the shortcomings of the proposed components. – some unclear component. In Fig 1, Non-local CAM, what does the C/2, H’W’, H’W’ mean? I could not figure out the definition of H’, W’, H’*W’.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper uses in-house dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The task and model are both well defined. Since the application is used for sarcopenia screening, I want to know experiments and performance in the general population setting.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clarity and soundness of task description, proposed model, experiments, and conclusion. The authors work on sarcopenia examination through hip x-ray and clinical data, bringing novel model components for solve the possible challenges. They prove the system effectiveness with comprehensive experiments.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The authors proposed a new multimodal model that combines image and tabular data for sarcopenia screening, and achieved superior results on their collected dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors have collected a relatively large dataset that includes hip X-ray images and patients’ structural information for sarcopenia screening.
    2. The proposed method successfully integrates image and tabular data, resulting in superior performance in sarcopenia screening.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Several design motivations and important details of the proposed NLC (generate CAM + NLM), VFF, and ARC that need to be clarified. 1) CAM generation: i. Does the ResNet18 used to generate the CAM use parameters pre-trained with ImageNet, or are the parameters from the “downstream main encoder” utilized (the encoder in Fig. 1a), or did the authors pretrain a ResNet18 on the training set first? ii. To calculate Smooth Grad-CAM++, a target class and a loss function (to calculate the gradient used in Smooth Grad-CAM++) are required. I assume the authors used cross-entropy loss here, so which classification head did they use to generate the prediction?(if the ResNet18 is pre-trained on the training set, then the authors do not need to answer this question) During inference, which class will the authors choose to calculate the loss for CAM? Do the authors always use the positive class to calculate the loss? 2) NLM: i: There may be a typo in Fig.1, as the output size of θ and ϕ should be C/2, H, W instead of C/2, HW, HW. Could the authors please confirm? ii: The authors used an unconventional way to calculate the attention matrix, as shown in Eq. 2. The motivation behind this design needs to be clarified, and an ablation study comparing the performance of this design with the conventional way (arxiv 1711.07971 Non-local NN) should be included. 3) VFF: Why did the authors first concatenate the tabular feature with the image feature and then apply an attention mechanism across each spatial position? Wouldn’t a simple cross-attention between the image feature and the tabular feature make more sense here? The motivation behind this design needs to be clarified, and an ablation study should be conducted to support its superiority. 4) ARC: Since the authors already have the label of each subject on the training set, it seems that a supervised contrastive loss (arxiv 2004.11362) would be more appropriate to apply here. The authors should elaborate on why they chose to use an unsupervised contrastive loss, and an ablation study comparing it with the supervised contrastive loss should be included.

    2. Did the authors strictly adhere to the MICCAI2023 template? It appears that the font and line spacing differ from that specified in the template.

    3. The motivation of the work could be clarified further. As the authors mention that the muscle and fat regions on X-ray images are difficult to differentiate, in clinical workflows, why not use other advanced yet still highly accessible imaging techniques, such as low-dose computed tomography (LDCT)?

    4. Since the collection of the dataset is listed as one of the major contributions, do the authors intend to release the dataset after the paper’s publication?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If the authors will share the code, as announced in the paper, it should be possible to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please refer to question 6.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    If the concerns listed in question 6, particularly the first two major concern, can be adequately addressed and clarified, I will increase the rating.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths: This paper proposes a multi-modality contrastive learning model for sarcopenia screening using hip X-ray images and clinical information. A large dataset for screening sarcopenia from heterogeneous data is collected. Weaknesses:

    1. Several design motivations and important details of the proposed NLC should be clarified.
    2. The ablation study should be conducted to verify the effectiveness of the proposed method.




Author Feedback

N/A



back to top