Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Junyu Li, Han Huang, Dong Ni, Wufeng Xue, Dongmei Zhu, Jun Cheng

Abstract

Early diagnosis of renal cancer can greatly improve the survival rate of patients. Contrast-enhanced ultrasound (CEUS) is a cost-effective and non-invasive imaging technique and has become more and more frequently used for renal tumor diagnosis. However, the classification of benign and malignant renal tumors can still be very challenging due to the highly heterogeneous appearance of cancer and imaging artifacts. Our aim is to detect and classify renal tumors by integrating B-mode and CEUS-mode ultrasound videos. To this end, we propose a novel multi-modal ultrasound video fusion network that can effectively perform multi-modal feature fusion and video classification for renal tumor diagnosis. The attention-based multi-modal fusion module uses cross-attention and self-attention to extract modality-invariant features and modality-specific features in parallel. In addition, we design an object-level temporal aggregation (OTA) module that can automatically filter low-quality features and efficiently integrate temporal information from multiple frames to improve the accuracy of tumor diagnosis. Experimental results on a multicenter dataset show that the proposed framework outperforms the single-modal models and the competing methods. Furthermore, our OTA module achieves higher classification accuracy than the frame-level predictions.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_62

SharedIt: https://rdcu.be/dnwH8

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a multimodal method to deal with renal tumor diagnosis. The network includes an attention-based fusion module and an object-level temporal aggregation module which are the main contribution of this method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. As claimed, this paper is the first deep learning framework that integrates both B-mode and CEUS-mode information for renal tumor diagnosis.
    2. The writing is easy to follow.
    3. The performance of the proposed method is good.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Novelty: The major technical contribution of this work is the proposed AMF module and the OTA module. The AMF module is a combination of self-attns and cross-attns applying to the two modal features. The OTA module is self-attn on the frame features. The novelty of the method is not significant.
    2. Reproducibility: The method is quite complicated since the training procedure includes three steps. However, the code or model is not provided. Besides, the dataset is collected internally and not publicly available.
    3. Experiments: In section 3.3, only the detection performance is compared, while the diagnosis performance is ignored.

    Minor questions:

    1. Is F a whole feature map or a feature point in equation (1)? Is the attention module applied to the features with the same spatial locations?
    2. In OTA module, since the number of the input frame is not fixed, how to generate the final classification results?
    3. The class labels are provided by radiologists. Can these labels be regarded as a gold standard to measure the performance of the model?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors use an internal dataset. In the checklist, they select to provide the codes, data, and model. However, these informations do not appear in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors should ensure reproducibility by providing a detailed description of the methods, or code and data. Also, the missing experiments should be added as mentioned above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is mainly according to the novelty, reproducibility and experiments.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper propose an attention-based multi-modal ultrasound video fusion network for renal tumor diagnosis, achieving good performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper presents a novel framework for the specific task. The method is solid, sound, and novel, with simple and elegant design of deep learning model.
    2. The paper has very comprehensive evaluations and ablation studies, with sufficient implementation details included, confirming the value of the proposed method.
    3. The manuscript is generally well, clearly written.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No major weakness noticed

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. If possible, include the comparison of model efficiency (params, flops) for the competing methods.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a very solid and well written paper for the domain. I don’t have much more to comment on

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a multi-modal ultrasound video fusion network for renal tumor diagnosis that provides an innovative and effective way to improve patient outcomes. The authors’ approach demonstrates clear superiority over state-of-the-art methods and includes an innovative Object Level Aggregation (OTA) submodule. This contribution can be inapirational in the field of renal tumor diagnosis and has the potential to be generalized to other domains of tumor detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a multi-modal ultrasound video fusion network for renal tumor diagnosis. The proposed method fuses B-mode and CEUS-model ultrasound videos to enhance renal tumor diagnosis. The authors present sufficient ablation and comparison studies, demonstrating the effectiveness of the proposed approach. The study compares the video-based diagnosis on Acc, F1, and detection results over AP50 and AP75. The results show that the proposed method outperforms state-of-the-art methods, indicating its potential for improving patient outcomes.

    The Object Level Aggregation (OTA) submodule is a novel approach to filtering low-quality features in renal tumor diagnosis. By fusing different levels of information abstraction, OTA achieves the goal of automatically filtering low-quality features. The submodule is a significant contribution to the field of medical image analysis, with the potential to improve the accuracy of tumor detection. The authors’ innovative approach is well-organized and approachable, making it easy to understand and replicate the proposed methodology.

    Overall, the paper is a valuable contribution to the field of medical image analysis. The proposed approach provides a new and effective way to address the challenging task of renal tumor diagnosis. The study presents a clear demonstration of the effectiveness of the proposed approach, with the potential to improve the accuracy and efficiency of renal tumor diagnosis. Future studies could benefit from exploring the potential applications of this approach to other medical domains.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper has some weaknesses that need to be addressed. Firstly, the fusion of information is not novel enough in general research. However, it is not a critical issue since the proposed solution to the problem is significant. Future studies could benefit from exploring new and innovative ways to fuse information that could improve the accuracy and efficiency of renal tumor diagnosis.

    Secondly, the model complexity is a bit too high for researchers and practitioners to follow or reproduce. For example, the AMF model and PA-FPN require crossing and fusion of attention, which can potentially cause computational complexity issues. The parallel structure for both modalities also has a higher cost. The authors could address this weakness by simplifying the model structure or providing more details and resources to help other researchers understand and replicate the proposed methodology.

    [1] Zhou T, Ruan S, Vera P, et al. A Tri-Attention fusion guided multi-modal segmentation network[J]. Pattern Recognition, 2022, 124: 108417. [2] Ye Y, Ren X, Zhu B, et al. An adaptive attention fusion mechanism convolutional network for object detection in remote sensing images[J]. Remote Sensing, 2022, 14(3): 516. [3] Praveen R G, de Melo W C, Ullah N, et al. A joint cross-attention model for audio-visual fusion in dimensional emotion recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2486-2495.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I believe that this paper has the potential to contribute to the community through its reproducibility, especially if the authors open-source their code and software. This would allow other researchers to easily replicate their results and build upon their findings.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    (1) The authors could simplify the model architecture by exploring alternative approaches to fusion, such as feature concatenation or feature selection.

    (2) The authors could provide more detailed analysis of how the proposed approach performs on specific subtypes of renal tumors, which could help to identify potential limitations and lead to further improvements.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Significance in specific field, model complexity, and motivation of this work.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes an attention-based multi-modal ultrasound video fusion network for renal tumor diagnosis, and the experimental results have shown good performance. However, the paper still has the following issues: 1) The novelty of the method is not significant; 2) The diagnosis performance is ignored; 3) It is not clear that the attention module is used with the same spatial locations; 4) Model efficiency (params, flops) should be discussed; 5) How to generate the final classification results? 6) More discussions of how the proposed method performs on specific subtypes of renal tumors are needed.




Author Feedback

We appreciate the meta-reviewer and Reviewers for the constructive comments. We are delighted by the many positive comments about our work presents a novel and effective framework for renal tumor diagnosis (R2, R3), has very comprehensive evaluations and ablation studies (R2, R3), demonstrates clear superiority over SOTA methods (R1, R2, R3), and is well-written (R1, R2). In the following, we address the reviewers’ main concerns as summarized by the AC. We will release our code and data, and provide a link in our revised paper.

Q1: Method Novelty. (R1) A: The proposed solution is significant for renal tumor diagnosis based on multimodal ultrasound videos. To the best of our knowledge, this is the first deep learning based study integrating multimodal ultrasound information for enhanced renal tumor diagnosis. Furthermore, we tailor different functional modules to form a novel multi-modal video fusion framework,including the use of SOTA network as backbone, the proposed AMF module to fuse B-mode and CEUS-mode information, and the proposed OTA module to aggregate information from multiple frames. Finally, we present sufficient ablation and comparison studies on a multicenter dataset to demonstrate the effectiveness of our design. As noted by the reviewer, our work can be inspirational in the field of renal tumor diagnosis and has the potential to be generalized to other domains of tumor detection.

Q2: Diagnosis performance is ignored. (R1) A: In fact, the metric average precision (AP) implies the diagnosis performance since a true positive detection requires the correctly predicted class label. For a more direct comparison, we have further calculated accuracy and F1 score. Compared with the four SOTA multimodal fusion methods, our method achieved the best results. In the validation set, our method achieved acc=0.840 and F1 score=0.840, while the best competing method (CEN) had acc=0.830 and F1 score=0.830. In the test set, our method achieved acc=0.909 and F1 score=0.900, while the best competing method (CMF) had acc=0.878 and F1 score=0.868.

Q3: Is the attention module applied to the features with the same spatial locations? (R1) A: The attention module is applied to all features in the whole feature map F of size HWC. Specifically, we flatten F into H*W vectors of length C and map each vector into q (query), k (key), and v (value). Then we use the scaled dot-product attention to encode the flattened F. After the attention process, the obtained encoding is unflatten back to a feature map. The attention mechanism allows global interaction among the feature vectors at different spatial locations.

Q4: Model efficiency. (R2) A: We will include the comparison of model efficiency in Tab. 4. The FLOPs and number of parameters (M) of our model are 117.69 and 66.76, respectively, which are comparable to the four competing methods, 97.85 and 45.45 for CMML, 127.01 and 187.06 for CEN, 103.53 and 51.26 for CMF, 109.19 and 57.46 for TMM.

Q5: How to generate the final classification results? (R1) A: During the training and inference process, the number of the input frames is fixed. The OTA module aggregates the features of a fixed number of frames, and the output feature vector is fed into a classification head (MLP) to generate the final classification results. We will make this clearer in Section 2.3.

Q6: How does the proposed method perform on specific subtypes of renal tumors? (R3) A: We previously did not collect information about tumor specific subtypes since our primary goal was to perform benign/malignant classification. There is a big difference in the incidence of different renal tumor subtypes. For example, clear cell renal cell carcinoma (RCC) accounts for ~75% of RCC. For an unbiased performance evaluation on specific subtypes, especially for the rare ones, a larger dataset is needed, which we are considering in our future work.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a multi-modal ultrasound video fusion network that can effectively perform multi-modal feature fusion and video classification for renal tumor diagnosis, and the experimental results have shown the effectiveness of the proposed method. Overall, the rebuttal has addressed the main concerns of reviewers.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the reviewers have not responded to the rebuttal or change their scores, I read the rebuttal and I think the authors’ addressed the majority the concerns.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Nice paper with trough evaluation. The authors have responded elegantly to the comments from the reviewers. Their rebuttal is well-laid and to the point. They still need to add the diagnostic metrics in the camera-ready version along with the model efficiency



back to top