Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhonghang Zhu, Liansheng Wang, Baptiste Magnier, Lei Zhu, Defu Zhang, Lequan Yu

Abstract

Multi-modality 3D medical images play an important role in the clinical practice, as each modality captures specific characteristics of the underlying anatomical information. Due to the effectiveness of exploring the complementary information among different modalities, multi-modality learning has attracted increased attention recently, which can be realized by Deep Learning (DL) models. However, it remains a challenging task for two reasons. First, the prediction confidence of multi-modality learning network cannot be guaranteed when the model is trained with volume-level labels, which provide weak supervision to learn 3D information. Second, it is difficult to effectively exploit the complementary information across modalities and also preserve the modality-specific properties when fusion. In this paper, we present a novel Reinforcement Learning (RL) driven approach to comprehensively address these challenges, where an independent learning mechanism is proposed to choose reliable and informative features within modality and explore complementary representations across modalities with the guidance of dynamic weights. Particularly, two Recurrent Neural Networks (RNN) based agents are utilized for representation learning within a single modality (intra-learning) and among different modalities (inter-learning), which are trained via Proximal Policy Optimization (PPO) with the confidence increment of the prediction as the reward. To validate the proposed method, we take the 3D image classification as an example and conduct experiments on a multi-modality brain tumor MRI data. Experimental results show that the classification performance is improved by 5.9\% when employing the proposed RL-based multi-modality representation learning.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_58

SharedIt: https://rdcu.be/cVRuJ

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The paper presents a novel Reinforcement Learning (RL) driven approach to get semantic, meaningful inter-modality features and complentory inter modality features. This is achieved using dynamic weighting using RNNS.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Paper is well written. All the section of the papers are well explained. The idea of combing the reinforcement learning for inter and intra modality representation learning is novel.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Experiments are shown only on one dataset making its generalizability a bit limited. 2) The size of the dataset used is too small. 3) No qualitative results are shown on how sematic inter/ intra modality networks are performing.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is generic mathamatically but the reproducibility is not depicted in the paper. With different dataset and task other challenges may show up which makes it hard to believe that the method is easily reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1) May be discuss some qualitative results. 2) In table 2, please make the best performing number as bold. 3) May be add another dataset in future.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is novel and strong. Paper is well written.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I accept the answer about another dataset Lndb and the possible generalizability, however for the qualitative results the review was not justified.



Review #3

  • Please describe the contribution of the paper

    This paper introduces RL into the intra-modality learning and inter-modality learning and proposes a novel hierarchic feature enhancement framework for multi-modality learning. The results demonstrate its effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors present a novel RL based method for 3D image classification:

    • A RL module is used to learn the most discriminative features from each modality.
    • Another RL module is used to learn to weight the features from different modalities.

    • Experiments are extensive. The authors not only compared with the several other comparison methods, but also conducted comprehensive ablation studies.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major concerns:

    • I totally agree with the challenges of multi-modality learning (paragraph 1 of Introduction). Could the authors show that the proposed method actually addressed the problems, for example, with feature visualization?

    • Is it possible that the accuracy boost is due to the increasing of model sizes? The authors compared with several other models with different model architectures, hence different model complexities. Is it possible that a larger/smaller baseline model can outperform the proposed method?

    • The organization of the paper is very poor. Without reading about the experiment, I don’t have any idea what’s the task and why it needs to be solved by reinforcement learning.

    • What are the actions of the agents, a continuous number? According to Fig1, the actions of intra-agent is used to modulate an intermediate layer of the modality-specific network. But how?

    • Questions regarding the experiments:

      • The authors preprocessed the input images and only used the lesion area for classification. This is concerning, as the whole process can be regarded as semi-automatic method.
      • The training procedure seems very strange to me. Why three optimizers? Did the authors only used fixed learning rates? If so, is it possible that a different learning rate will greatly boots the prediction accuracy?
      • The authors mentioned that they used 10-fold cross validation. However, they mentioned again that the best model is used to evaluate the testing set. It seems contradictory to me.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors will open source the code. It’s reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Please properly motivate the use of RL for classification and show how it addressed the mentioned challenges.

    2, This paper is not properly motivated. After introducing related works, the authors suddenly jump to the discuss of the novelties. I am kind of lost why RL for classification. Why it can be used to address the mentioned challenges?

    1. Please properly formulate the problem before introducing the details model structure.

    2. Are the comparison methods the state-of-the-arts? If not, please compare with the top methods for the open challenges.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper doesn’t meet the standard of MICCAI yet.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    This paper presents a novel Reinforcement Learning (RL) driven approach to comprehensively address these challenges, where an independent learning mechanism is proposed to choose reliable and informative features within modality and explore complementary representations across modalities with the guidance of dynamic weights.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper introduces the RL strategy into multi-modality learning. (2) An iterative hybrid-enhancement network is proposed to integrate intra-features and inter-features. (3) The overall structure is clear and well-organized. (4) Experimental results demonstrate the superiority of the proposed method against other existing methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) For the proposed model, how to obtain the final classification results? (2) Compared M2Net with the proposed model, they all adopt the modality-specific network and shared network. There, what are the main differences and its advantages? More discussions should be included. (3) Multi-modality fusion is a wide research topic, thus the related fusion methods and multi-modality learning algorithms should be discussed. (4) The dataset looks small, with only 165 subjects.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is public. The authors mention in reproducibility statement that they will release code and trained models after acceptance

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) For the proposed model, how to obtain the final classification results? (2) Compared M2Net with the proposed model, they all adopt the modality-specific network and shared network. There, what are the main differences and its advantages? More discussions should be included. (3) The dataset looks small, with only 165 subjects. This is a huge limitation in medical imaging, and it is an issue of the data for other researchers, thus the small sample issues and some related works should be discussed. It is also expected to discuss where the proposed model fails to predict some samples.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper innovatively introduces the reinforcement learning strategy into the intra-modality learning and inter-modality learning and also presents a novel hierarchic feature enhancement framework for multi-modality learning.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    o A novel Reinforcement Learning (RL) driven approach is proposed in this paper to obtain semantic, meaningful inter-modality features and complentory inter modality features. In general, this paper is well-written and the idea is interesting. However, this paper still has the following issues: 1) The experients are performed on only one dataset with small size, which lacks of generalization; 2) Lacks of qualitative results on how sematic inter/ intra modality networks, more discussions with qualitative results are desired; 3) Feature visualization may be needed to show whether the proposed method actually addressed the problems; 4) The baseline model should be more explored. 5) Poor organization of paper; 5) It is unclear that how the proposed model obtains the final classification results; 6) More discussions with M2Net and multi-modal fusion are needed.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We appreciate the favorable comments on our interesting and novel method design (MR, R2,R3,R4), well-written and clear structure (MR, R2, R4). Below, we address the main issues raised by reviewers, which we believe can be addressed in the final version.

#Meta-reviewer
Q1: Evaluation and small dataset. A1: We use BraTS18 dataset following M2Net (Tao Zhou et. al). Due to the difficulty of acquiring a large amount of 3D multi-modality medical data, we further conduct intra-modality evaluations on LNDb. We conducted extensive analysis on these datasets, as indicated by R3, to demonstrate the generalization of our method.

Q2&3: Qualitative discussions and feature visualization. A2&3: We visualize the learned weights for inter-modality fusion and find that the agent tends to pay more attention to the FLAIR modality. We cannot show these visualization results here, as we can only provide text in rebuttal.

Q4: Influence of baseline model size. A4: As mentioned in Section 3.2, our framework outperforms 3D CBAM, which shares the same feature backbone (baseline) network with ours, indicating that the accuracy boost is due to the RL-agent module not the increasing of backbone size.

Q5: Paper organization. A5: We organize the paper by first formulating the solved problem as multi-modality 3D medical data fusion for classification, introducing the key idea of RL-based approach, elaborating the detailed method design. Per your suggestion, we will further highlight the problem formulation and motivation in the final version.

Q6: Final classification results. A6: As illustrated in Section 2.3, the final classification results is obtained by picking the largest softmax probability of the final output vectors, which is the sum of O_T1, O_T1ce, O_T2, O_FLAIR and fusion output O_fusion.

Q7: Difference with M2Net and other multi-modal fusion. A7: M2Net projects input 3D volumes into 2D images and extract modality-specific features using 2D network. Differently, our method assigns weights generated by RL-agents (supervised by prediction confidence) to 3D features and extract modality-specific features using 3D network, which can generate more reliable and discriminative representations. Compared with other common multi-modal fusion methods, our method utilizes prediction confidence as the supervision of RL-agents to generate weights for feature enhancement, rather than concatenating/adding features.

#R2 Thanks for your advice. Please see Q1/Q2&3 and we will make the best performing number as bold in our final version.

#R3 Please also refer to Q2&3/4/5. Q8: Motivation of utilizing RL. A8: We aim to choose reliable and informative inter/intra modality features to promote 3D classification. RL can meet this challenge by taking confidence increment as agent reward and generating dynamic weights for feature enhancement.

Q9: Agent action and how to use it. A9: The intra agent action is formulated as a weight vector, which can be broadcasted to Bcdwh and multiplied to primary representation to enhance features.

–Q10: Evaluation strategy. –A10: We adopt 10-fold cross-validation and report the average performance of 10 folds. For each fold, we further divide the data (the other 9 folds) into training and validation parts and take the best model on validation part for evaluation.

Q11: About experiments. A11: The input images are preprocessed following M2Net. We train the backbone and agents with individual optimizers respectively, and the weight decays are set to 1e-5. We set the learning rate and other hyper-parameters empirically. Our task is different from the open challenge, and we select sota methods with the same task for comparison.

#R4 Please also refer to Q1/6/7. Q12: Discussion on failure samples. A12: Our task (survival risk prediction) is positively correlated with tumor size and morphology, but there are also clinical factors that cause the failure predictions of images.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Based on the feedback of the authors and the combined comments of the reviewers, we have decided to accept this paper. The authors have reasonably addressed the comments possible in their rebuttal - motivation, differentiation, and comparison are slightly more detailed. Visualizations should be included in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors have reasonably addressed the comments possible in their rebuttal - motivation, differentiation, and comparison are slightly more detailed. Despite limited dataset and 10-fold cross validation, paper appears to be reasonable in its current form for acceptance. Visualizations should be included in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a multi-modal approach for medical image classification driven by reinforcement learning, where two RNN based agents are trained for both intra- and inter-modality representation learning. The authors’ response to the novelty of the proposed method against other conventional multi-modal approaches is convincing. Given the difficulty of collecting large-scale multi-modal imaging data, BRATs2018 is not a too bad choice for the validation. In my opinion, the authors response to the motivation of using RL for classification and the clarification of evaluation strategy could clear the previous major concerns raised by Reviewer#2.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



back to top