Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tianling Liu, Wennan Liu, Lequan Yu, Liang Wan, Tong Han, Lei Zhu

Abstract

Preoperative and noninvasive prediction of the meningioma grade is important in clinical practice, as it directly influences the clinical decision making. What’s more, brain invasion in meningiomas (i.e., the presence of tumor tissue within the adjacent brain tissue) is an independent criterion for the grading of meningiomas and influences the treatment strategy. Although efforts have been reported to address these two tasks, most of them rely on hand-crafted features and there is no attempt to exploit the two prediction tasks simultaneously. In this paper, we propose a novel task-aware contrastive learning algorithm to jointly predict meningioma grade and brain invasion from multi-modal MRIs. Based on the basic multi-task learning framework, our key idea is to adopt contrastive learning strategy to disentangle the image features into task-specific features and task-common features, and explicitly leverage their inherent connections to improve feature representation for the two prediction tasks. In this retrospective study, an MRI dataset was collected, for which 800 patients (containing 148 high-grade, 62 invasion) were diagnosed with meningiomas by pathological analysis. Experimental results show that the proposed algorithm outperforms alternative multi-task learning methods, achieving AUCs of 0.8870 and 0.9787 for the prediction of meningioma grade and brain invasion, respectively. The code is available at https://github.com/IsDling/predictTCL.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_34

SharedIt: https://rdcu.be/cVRtk

Link to the code repository

https://github.com/IsDling/predictTCL

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The authors developed and implemented a network to simultaneously predict two binary clinical values from MR image data. Those values were meningioma grade (low or high) and brain invasion (no or yes). The input was the image data from three types of MR acquisitions (T1 with contrast, FLAIR with contrast, and ADC calculated from MR DWI). Using MR images collected retrospectively from 800 studies, the authors trained and tested their proposed network and compared the results with other networks they also implemented using quantified metrics such as sensitivity, specificity and AUC. The proposed method had highest values for most measures, and if not highest it was second highest.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of this paper include:

    • Using image data typically collected in MR brain tumor protocols.
    • A good number of studies (n=800) for training and testing for proof of concept.
    • Use of disentanglement contrastive learning layers to split image features into common and specific features to improve the prediction, and verified with ablation testing.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The weaknesses of this paper include:

    • Why was the input limited to T1c, FLAIR-c, and ADC? The addition of T1 may add more information as it would help the network learn where contrast was up-taken.
    • Given the span of time (5 years) and number of scanner models, it is unlikely that all 800 studies were performed under the “same scanning parameters”. Should add acquisition protocol ranges (TE/TR, gradients for DWI, etc).
    • Was there any check to make sure randomly drawn training and testing sets had similar distributions of low/high, invasion yes/no?
    • How did all three runs have the same distribution if selected randomly.
    • Only reported means of three training/testing sets, should also report ranges or SD.
    • Only used cropped ROIs for input scaled to be the same size - what were the original sizes (rows x cols x slices)? [Maybe this is in the reference.]
    • Tumor type must be known a priori before using the network to predict the grade and presence of invasion, something not usually verified until after biopsy or resection.
    • Not sure mean AUC is a valid measure.
    • Also, could do some sort of statistical testing between the quantitative metrics to test for significant differences (e.g., software from http://metz-roc.uchicago.edu could be used to compare the ROC curves from the proposed method to the other methods).
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Given that the authors would release the code upon acceptance, the reproducibility of the algorithm is high especially if a trained version is released. It could be higher with release of the data set used but understandably there are probably IRB, HIPAA or other issues to allow that.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This is very good work based on a fairly large data set from one institution.

    I think future work based on this might benefit from:

    • validating with prospective data not in training and testing data.
    • adding other brain tumor types
    • the addition of rCBV calculated from MR DSC. If MR DSC was collected for these studies during contrast administration, then rCBV could be added as an input as it has is known to identify invasion in other brain tumors ]L. S. Hu et al., “Accurate Patient-Specific Machine Learning Models of Glioblastoma Invasion Using Transfer Learning,” AJNR Am J Neuroradiol, Feb. 2019, doi: 10.3174/ajnr.A5981.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a strong paper and although I listed a number of weaknesses, the strengths greatly outweigh the weakness I have identified. The weaknesses are listed mostly a points that could be addressed to provide a journal paper.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The authors develop an approach for multimodal multitask learning for meningioma grade and brain invasion classification. The proposed architecture accepts multimodal inputs to produce a common feature representation that is then deconvolved into common and task specific feature vectors. Furthermore, a contrastive loss module is imposed to improve task-specific feature representations and model predictions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors state that this is the first example of multitask learning for meningioma grade and brain invasion prediction.
    • The proposed multitask framework is intriguing and improves model performance. The use of a task-aware contrastive loss is unique and lends itself to multitask learning.
    • Sufficient ablation experiments are provided to support the proposed method for multi-task learning.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is not clear how the common task features are aligned to the task specific features as described on page 5. The justification and method for this alignment should be made clearer. Why are non-task-specific features aligned to a specific task? How is doing so preferable to an initial disentanglement into only task-specific representations without a common task representation?
    • The authors do not have a validation data split for their experiments. This suggests the possibility that hyperparameters were chosen in a manner that would overfit on the testing data split.
    • It is difficult to interpret the true improvement in performance the model provides. The improvements for each ablation experiment are incremental but small, and the model outperforms baselines only by the AUC metric in the invasion prediction task.
    • The difference between other comparative methods and the authors’ method should be clarified. Why is the authors’ method better than others? What is the rationale behind the authors’ method should be clear.
    • The authors mention that they adapt MMoE. The authors should clearly explain the difference between their method and the MMoE method? Which part is different and why do they make the change?
    • The technical novelty should be better described
    • ‘Moreover, the accuracy of brain invasion determination heavily depends on the clinician’s experience’. –> reference supporting this claim?
    • how are the three conv and avg pooling operations different? - not clear
    • Paragraph before conclusion –> not clear if this improvement is statistically significant. In fact, the meningioma prediction performance decreases. Premature to draw such a conclusion based on these experiments alone. I have the same concern with subsequent addition of modules in the ablation experiments. Please perform a statistical significant test and report if these are indeed significant changes. This might just be a function of the dataset split
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and data do not seem to be made available for this work. However, the network architecture is described in sufficient detail for replication if the additional description of how the task-common features are aligned to the specific tasks is included.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. There are instances in which the methods section could be made clearer. Most significantly, further rationalization for the contrastive loss and alignment of the task-common representation should be provided in addition to details on how that alignment is performed. Furthermore, a rephrasing or additional discussion of the purpose of the auxiliary classification loss should be added for clarity.

    2. In section 2.1, the authors mention that they adopt a convolution layer and an average pooling to realize feature disentanglement. What is the rationale behind this?
    3. In section 2.2, the authors mention that they align the task common feature to the task specific feature. My understanding for task common features and task specific features is that they dont intersected with each other. Why do the authors think the task specific features can be transformed from task common features? Can the authors justify this?
    4. If the task common features can transform into two different types of task specific features (one for invasion, one for grading), why don’t we use different features in the beginning of the feature extraction step? My concern is that since we want to generate two task specific features, why do we need to generate them from task common features?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors present a novel approach for multi-task learning and perform appropriate ablation and baseline experiments. Their results demonstrate overall improved performance, especially in the multi-task setting. There are some instances where their methodology could be further explained or justified, which should be easily amendable. The lack of a validation split may have implications on the generalizability of their results and subsequent performance on independent datasets. However, this is be expected to be uniform across all ablation experiments and baselines.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    Though the authors have attempted to address the concerns raised in the reviews, some of the major concerns such as the rationale/technical novelty is still not adequately described. Simply stating the assumption as “… task-specific features work for a single task, and task-common features can help both tasks” does not address this concern. Another reviewer also points out the issue of cross-validation - just averaging over three runs and getting low std may not be indicative of robustness. Additionally, was data distribution preserved while randomly dividing the dataset? The experiments could have been repeated multiple times in the 3-fold setting. In rebuttal Q8, instead of providing the rationale, the authors delve into the details of the method. My final opinion reflects these concerns.



Review #4

  • Please describe the contribution of the paper

    The goal of this paper is to present a novel model for joint prediction of meningioma grade and brain invasion from multi-model MRIs. A multi-task learning approach is proposed that derives task-common and task-specific features from a shared encoder. A contrastive learning strategy is used to align the task-common features for each task and enforce similarity between feature embeddings contributing to the same task. The method is evaluated on a private database of 800 multi-modal MRIs (T1, Flair, ADC). Dataset is imbalanced with most meningioma being low grade. Results are promising with some interesting AUC for both task although a bit lower for meningioma grade prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clinical interest (it could avoid invasive assessment by biopsy).
    • Simplicity of the method
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Rather weak statistical analysis does not really show that the proposed method if more adequate than simpler ones (Table 3) or other approaches (Table 2).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is well described and code will be published. Reproducibility is good even if the dataset is not public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is interesting from a clinical point of view and the proposed method is simple yet not trivial. But the rather weak statistical analysis does not make the paper completely convincing. Generalization is evaluated by splitting the dataset randomly in training/validation three times (and only mean AUC on validation is reported). Repeated cross validation could have been used for this purpose. This makes the evaluation of the improvement of the proposed method over sota or simpler approaches difficult as the numbers (e.g. AUC) are quite close. Significance tests (e.g. DeLong’s) could have been used. Some reported metrics like accuracy are not really relevant for such imbalanced datasets, other ones (MCC) could have been considered and confidence intervals should be given as well.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach is novel and elegant and yet simple to implement. Yet the validation is too weak to be completely convincing. The benefit of adding task-common features and contrastive losses is not clear.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors propose a novel task-aware contrastive learning algorithm to jointly predict meningioma grade and brain invasion from multi-modal MRIs. Based on the basic multi-task learning framework, the authors proposed to adopt a contrastive learning strategy to disentangle the image features into task-specific features and task-common features, and explicitly leverage their inherent connections to improve feature representation for the two prediction tasks. The reviewers agreed that the approach has merit and addresses an important clinical question. However, some concerns were raised with regard to the experimental design. Please address the concerns regarding hold-out validation set, incremental improvement in performance, comparative strategies, and technical novelty.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

We appreciated the favorable comments on the important clinical problem (AC, R3, R4) and the novelty of our method (AC, R2, R3). Below, we clarify the main issues raised by  reviewers.

Q1: Data division and validation dataset (AC, R3, R4) A1: Due to the small amount of invasion samples, we use randomly drawn data division to alleviate overfitting. Mean and standard deviation of three runs are got as the final results, e.g. invision AUC (EFMT: 0.955+-0.0305, MFMT: 0.953+-0.0244, MMoE: 0.943+-0.0369, MAML: 0.967+-0.0287, Our: 0.979+-0.0136). Our std is relatively small, indicating that our method is stable.

Q2: Statistical analysis (AC, R2, R3, R4) A2: We computed the MCC metric and the confidence intervals of all metrics e.g. meingoma MCC (EFMT: 0.371+-0.0490, MFMT: 0.533+-0.0462, MMoE: 0.367+-0.0368, MAML: 0.468+-0.0571, Our: 0.586+-0.0535). We also calcualted AUC differences between compared methods and ours using ROC-kit, while p-values are less than 0.05.

Q3: Incremental improvement in ablation study (AC, R3, R4) A3: Besides AUC, we computed MCC metric, invasion (0.538, 0.621, 0.610, 0.597, 0.625), with a 0.087 improvement; grading (0.494, 0.516, 0.490, 0.527, 0.586), with a 0.092 improvement.

Q4: Technical novelty and rational of our method (AC, R3)  A4: We exploit task-common features for multi-task learning, based on the assumption that task-specific features work for a single task, and task-common features can help both tasks. To achieve this goal, we propose task-aware contrastive learning (CL) to guide task-common features to boost task-specific features. Also note that CL is usually used for image-level and pixel-level before our work, and we are the first to explore task-level CL.

Q5: Experimental data (R2) A5: Following clinical practice, we use three modalities in our work. During experiments, we randomly select training and testing sets in all three runs to ensure training and testing sets have similar distribution of low/high, invasion yes/no.

Q6: Benefit of task common features disentanglement (R3, R4) A6: We think that extracted features contain task-specific features (for addressing each specific task), as well as task-common features (capable for addressing both tasks). Most existing multi-task methods produce task-specific features, while we try to exploit the task-common features to further enhance two classification accuracy. To do so, we devise a contrastive learning to help convert the task-common features into two features for predicting two tasks respectively with aid of two auxiliary branches. Then, we combine these two features with the two task-specific features. By this way, we can further decrease the intra-task feature distance and increase the inter-task feature distance. This helps to boost meningioma grade and brain invasion classification accuracy, which are proved in our experiments.

Q7: MMoE adaption (R3) A7: The adapted MMoE is used for comparison and is not baseline of our method. The original MMoE is for text processing and we adapt it for image processing by using ResNet as feature backbones. Differently, our method uses task-common features to boost multi-tasks (see Q6), and employs contrastive learning to achieve the boosting. 

Q8: Rational behind feature disentanglement  mechanism (R3) A8:  The feature distanglement is to extract task-specific and task common features for two classification tasks. Here three sets of a convolution layer and an average pooling are used to transform the concatenated feature embedding into feature vectors G_i, G_c, G_m. Note that the three sets do not share the weights. To realize feature disengtaglement, we actually rely on the succedent task-aware constrastive learning as well as the two auxilary classification branches, which impose soft and hard constraints on the physical meaning of G_i, G_c, G_m.

Q9: Other comments (R2, R3) A9: The difficulty of brain invasion diagnosis is supported by Zhang et al, EBioMedicine 2020. 




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors did a reasonable job with addressing some of the comments raised by the reviewers. However, following the rebuttal, reviewer 2 still had concerns regarding the rationale/technical novelty not being adequately described. Please address these lingering concerns in the final submission.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper is well-written and the main idea is clearly conveyed. The use of task-specific and task-common features is interesting. For the results, as the dataset is highly unbalanced, e.g., only 62 invasive samples out of 800, the sensitivity should be the most expressive metric in the tables. Therefore, the ablation results in Table 3 show that the task-common branch is the most important, while the auxiliary branch and contrastive loss are probably unnecessary. This is good enough for this paper to be useful to other researchers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Rebuttal reasonably addresses the comments raised by reviewers on initial submission. Combination of task-specific and task-common features is interesting. Bolstering AUC wit MCC is reasoanble (but the statistical analysis using only 3 runs is not ideal), given the class imbalance AUPRC should have been reported as well to ensure the minority class is reasonably accurately classified.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top