Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Menglei Jiao, Hong Liu, Jianfang Liu, Hanqiang Ouyang, Xiangdong Wang, Liang Jiang, Huishu Yuan, Yueliang Qian

Abstract

The multi-modal fusion of medical images has been widely used in recent years. Most methods focus on images with a single plane, such as the axial plane with different sequences (T1, T2) or different modalities (CT, MRI), rather than multiple planes with or without cross modalities. Further, most methods focus on segmentation or classification at the image or sequence level rather than the patient level. This paper proposes a general and scalable framework named MAL for the classification of benign and malignant tu-mors at the patient level based on multi-modal attention learning. A bipartite graph is used to model the correlations between different modalities, and then modal fusion is carried out in feature space by attention learning and multi-branch networks. Thereafter, multi-instance learning is adopted to ob-tain patient-level diagnostic results by considering different modal pairs of patient images to be bags and the edges in the bipartite graph to be instances. The modal and intra-type similarity losses at the patient level are calculated using the feature similarity matrix to encourage the model to extract high-level semantic features with high correlation. The experimental results con-firm the effectiveness of MAL on three datasets with respect to different multi-modal fusion tasks, including axial and sagittal MRI, axial CT and sag-ittal MRI, and T1 and T2 MRI sequences. And the application of MAL can also significantly improve the diagnostic accuracy and efficiency of doctors. Code is available at https://github.com/research-med/MAL.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_17

SharedIt: https://rdcu.be/cVRs3

Link to the code repository

https://github.com/research-med/MAL

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors proposed a multi-modal attention learning method for patient-level tumor benign and malignant diagnosis using a bipartite graph structure to model the correlation of different modality data. They also proposed a modal similarity loss function and intra-type similarity loss function PiTSLoss at patient-level.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of multi-modal attention learning is interesting and address an imortantresearch problem.
    • Authors used a relatively large clinical dataset to demonstarte the performance of their proposed method.
    • Performing extensive experiments and acheived high performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Writing in my opinion is not very good and could be better than presented in the manuscript.
    • Many long sentences and in some places unclear meaning is given.
    • Lack of given results in terms of figures is one weekness, I would like to include the figures of the suplmentary material in the main manuscript.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The datasets used in this work are private clinical data which are not public.
    • Authors mentioned their code will be publicly available on Github.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Beside the comments I mentioned in section 5 (weaknesses), I have the following comments. Abstract is not very informative, no information about the datasets and performance accuracy.

    • The selection of the axial slices from CT and sagittal ones from MRI is not demonstrated. Would it make difference if these slices are reversed. Also, which MRI sequences used and would it be better to use all sequences of MRI?
    • Authors mentioned that they used the average value of CrossEntropy loss, PMSLoss, and PiTSLoss. Did you explored different weighted losses of these terms?
    • So many abbreviations are not defined before using (e.g. CT, MRI and others).
    • Many typos appeared in many places e.g. “topk”, “Experiment and Result”, …
    • References are a bit old. Below are some suggested references that authors may use to update their reference list: https://doi.org/10.3389/fgene.2021.690049 https://doi.org/10.1016/j.media.2022.102444 https://doi.org/10.1016/j.compbiomed.2021.104836 https://doi.org/10.1007/978-3-030-87199-4_10
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See my comments above.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper proposes a multi-modal attention learning framework MAL for tumor diagnosis. A bipartite graph structure is used to model the correlation of different modality data and the edges of the graph are predicted through the attention learning and multi-branch network. The modal similarity and intra-type similarity loss are calculated from the feature similarity matrix to extract better high-level semantic features. The multi-instance learning is used to obtained final diagnosis results of patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) In this paper, images of different modalities and different planes are considered at the same time. 2) It is novel to use bipartite graph structure to model the correlation of different modality data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The proposed method was only evaluated on three spine tumor datasets while the author emphasized its general applicability on tumor diagnosis in the title, Abstract and Conclusion. 2) It is mentioned in the paper that PMSLoss can encourage the model to extract similar high-level semantic features from different modalities, however, features of different modalities may be complementary to improve the classification performance. It is also demonstrated in Fig. 1 of the supplementary materials. Is it contradictory? Did the PMSLoss surpress complementary features during the training process? 3) The experiments are not convincing enough. In Table 1, when comparing MRI-Axi&Sag results of the proposed MAL method (the last row) with the method without AB and TS modules (the fifth row), the AUC is similar while the ACC and SP of the latter is higher than the former. 4) The author mentioned that the convolutional neural network branch and the attention auxiliary branch extract local and global features individually, is there any visual results or other experiments to further prove this? 5) The writing of this paper is extremely poor with plentiful grammar mistakes.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Implementation details have been described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1) Page 1, Introduction, Paragraph 2, “Multi-modality methods can improve the diagnostic accuracy by multi-modal data analysis and model construct … ” should be “Multi-modality methods can improve the diagnostic accuracy by multi-modal data analysis and model construction … ” 2) Page 2, Paragraph 1, “The performance of these multi-modal methods are considerably improved…” should be “The performance of these multi-modal methods is considerably improved…”. Besides, please provide references with numeric results. 3) Page 2, Paragraph 2, “… for feature fusion to segment brain tumor” should be “… for feature fusion to segment brain tumors”. 4) Page 3, Fig. 1, Please explain the meaning of the green line. How to obtain the interested regions and crop them from original images? 5) Page 4, Paragraph 1, “patches with 8×8 pixels” should be “patches of 8×8 pixels”. 6) Page 5, “ 3 Experiment and Result” should be “3 Experiments and Results”. 7) Page 6, Paragraph 1, “The benign and malignant labels of each patient” should be “The benign and malignant labels of patients”. 8) Page 6, Paragraph 6, “Result of different methods” should be “Results of different methods”. 9) Page 7, Table 1, the font of the first line is not the same. 10) Page 7, Paragraph 2, “it still can reveal …” should be “it can still reveal …” 11) Page 7, Paragraph 4, “Compare with doctors” should be “Comparison with doctors”. 12) Page 8, Paragraph 2, “The diagnostic accuracy and efficiency of doctors have greatly improved …” should be “The diagnostic accuracy and efficiency of doctors have been greatly improved …” 13) Page 8, Conclusion, “The attention learning and multi-branch strategy in MIL are used …” should be “The attention learning and multi-branch strategies in MAL are used …”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method was only evaluated on three spine tumor datasets while the author emphasized its general applicability on tumor diagnosis in the title, Abstract and Conclusion. The experiments are not convincing enough.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    This paper proposes a multi-modal attention learning framework for patient level tumor malignancy classification. The results show the effectiveness of the proposed method, comparable and in some cases exceeding doctor performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) An interesting multimodal fusion framework based on bipartite graph and attention learning to explore the correlation between different modalities; 2) The PMSLoss/PiTSLoss seem to encourage the model to learn the inherent similarity between features from different modalities at patient level; 3) The results show the effectiveness of this approach, through ablation study and comparison to doctors.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) In the experiment results, it would be good to compare with the existing SOTA method. It is unclear from Table 1, which method is the current SOTA method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is claimed to be available at Github.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Adding comparison result with the current SOTA method could further strengthen the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The multimodal learning framework seems to be novel for the MRI/CT data. The results in comparison with doctors seem to be good.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Major strength/contributions are a novel multi-modal attention learning framework to fuse information across multiple modalities, with evaluation on two cohorts, ablations studies, as well as comparison to experts.

    • Some quantitative data should be added to the abstract
    • Clarification of application primarily to spine data in abstract
    • Better explanation of statistical testing
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

We like to thank all reviewers and meta-reviewer for their constructive comments and acknowledging the contributions of this work. We will revise our paper according to reviews in our camera-ready paper. Here, we focus on addressing major concerns and questions.

R#1: In the experiment of main paper, we focus on the fusion of different scanning modalities and different planes, and the experimental results of different sequences are shown in supplementary materials. As mentioned in the paper, we constructed a complete bipartite graph between axial CT and sagittal MRI, that is, each axial CT image will match each sagittal MRI image and we give the representative results in our experiments. In addition, for CrossEntropy Loss, PMSLoss and PiTSLoss, we only averaged them and got good results. Using adaptive weight maybe better, we will conduct experiments for verification and analysis in the future.

R#4: We conducted experiments on three datasets with multi-modal data at different levels, such as, the fusion of different scanning modalities and different planes, the fusion of the same scanning modality and different planes, and the fusion of different MRI sequences. The experimental results show that our model can effectively deal with different types of multi-modal fusion datasets. In addition, the purpose of PMSLoss is to extract the features as similar as possible between different modal data of the same patient. The “similarity” here refers to encourage the model to extract the internal correlation information and complementary relationship of features between different modal data as much as possible. In our MAL, the convolution neural network branch carries out feature extraction through convolution kernel, which cannot pay attention to the global information of the image at the lower feature scale, while the attention assisted branch can pay attention to the global information of the image at the lower scale by the image matrix operation. The experimental results show the effectiveness of our proposed MAL method.

R#5: In our experiments, multi-modal spine tumor datasets collected from cooperative hospital were used. Almost all the existing public multi-modal datasets are based on single frame or single sequence. There are almost no public datasets similar to our task (fusing the different modal types for patient-level diagnosis), so we can’t find the existing SOTA results. In our experiments, all models adopt the original hyperparametric configuration to compare all methods fairly.



back to top