Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Han Liu, Yubo Fan, Hao Li, Jiacheng Wang, Dewei Hu, Can Cui, Ho Hin Lee, Huahong Zhang, Ipek Oguz

Abstract

Multiple Sclerosis (MS) is a chronic neuroinflammatory disease and multi-modality MRIs are routinely used to monitor MS lesions. Many automatic MS lesion segmentation models have been developed and have reached human-level performance. However, most established methods assume the MRI modalities used during training are also available during testing, which is not guaranteed in clinical practice. Previously, a training strategy termed Modality Dropout (ModDrop) has been applied to MS lesion segmentation to achieve the state-of-the-art performance with missing modality. In this paper, we present a novel method dubbed ModDrop++ to train a unified network adaptive to an arbitrary number of input MRI sequences. ModDrop++ upgrades the main idea of ModDrop in two key ways. First, we devise a plug-and-play dynamic head and adopt a filter scaling strategy to improve the expressiveness of the network. Second, we design a co-training strategy to leverage the intra-subject relation between full modality and missing modality. Specifically, the intra-subject co-training strategy aims to guide the dynamic head to generate similar feature representations between the full- and missing-modality data from the same subject. We use two public MS datasets to show the superiority of ModDrop++. Source code and trained models are available at https://github.com/han-liu/ModDropPlusPlus.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_43

SharedIt: https://rdcu.be/cVRyX

Link to the code repository

https://github.com/han-liu/ModDropPlusPlus

Link to the dataset(s)

https://smart-stats-tools.org/lesion-challenge


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper demonstrates a deep-learning based dynamic filter network for Multiple Sclerosis (MS) lesion segmentation. The authors proposes dynamic head with filter scaling and intra-subject co-training for the scenario that some modalities might be unavailable during training and testing in the clinical practice. The proposed method can be adapted to any arbitary number of MRI input modalities for automatic MS lesion segmentation task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper presents a solution to the clinical practice where one or more MRI input sequences are absence during training and inference phase for a deep-learning method. Specifically designed dynamic head to scale the input and th intra-subject co-training to enhance the learning ability of network to learn similar features for different combinations of the input sequences. Extensive experiment of all possible input scenarios were conducted. The demonstration of the propsoed method and paper writing is clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The design of the dynamic head is trivial and lacks proof, the filter scaling matrix is just served like a normal linear layer without much more modifications. Thus it is hard to justify the usefulness of such a matrix to scale the features extracted with missing modalities during training and inference phase.

    The loss used for features extracted after dynamic head are treated equally, however, the features extracted by more available modalities should have more confidence. The use of SSIM loss seems not appropriate for features in latent space, why not using the perceptual loss? Also, the reason to use fixed layers after feature extraction was not justified.

    The experimental result further proves my opinion where the performance of the proposed method dropped significantly when FLAIR is absent from the training and inference. While the performance was comparable to full modality setting when only FLAIR is available. This observation only proved that FLAIR is crucial for MS lesion segmentation tasks but not the effectiveness of the proposed method. It is actually true that FLAIR is important for lesion identification tasks in real clinical settings and the proposed method was not able to gain performance improvement. Furthermore, even Contrast-Enhanced (CE) sequence was provided in the dataset, I don’t think this sequence can be used in this setting since CE is used to detect the active lesions. For the general lesion segmentation task, the information provided by CE on a MS lesion can be different (bright for active lesion and dark for MS lesion), thus including this sequence may lead to a negative performance gain (as shown in Table 1).

    The results of ISBI dataset showed no significant difference in performance between full and semi modality. The discussions on this observation and performance differences between two datasets were not found.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provided sufficient information to reproduce the propsoed method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors defined a good problem to be solved based on the real clinical setting, however, the proposed solution was not good enough to address such a problem.

    Authors should consider a more convincing approach on feature scaling from missing modality setting to full modality. Features from the dynamic head can be treated differently based on their confidence.

    The experiments should be designed more carefully, since the problem was inspired from a real clinical perspective, the design of the experiment should follow this. The inclusion of the modalities involved in the experiment should be re-considered and discussed.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper tried to address a realistic problem in the clinical settings for MS lesion segmentation, however, the solution proposed was trivial and lacked justification. In addition, the experiments were extensive but not well-designed. The information shown by the experimental results did not support the effectiveness of the proposed method.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors introduce a novel framework named ModDrop++ to train a segmentation netowork with missing input MRI sequences. This approach can easily be integrated in any existing convolutional neural network, and, compared to the state-of-the-art method ModDrop, shows improved performance for multiple sclerosis lesion segmentation on two publicly available datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The proposed approach is designed to be a plug-and-play method that can be easily integrated in any existing segmentation CNN. -For the first time, dynamic filters are applied to the missing modality problem, showing interesting performance. -A novel strategy for intra-subjects co-trianing is proposed to leverage the intra-subject relation between the full and missing-modality data. Ablation studies are performed showing that this further improved the segmentation performance. -Missing modalities are a persistend problem in MS imaging and, if confirmed by further studies, this method could have a significant impact on lesion segmentation automated tools. -The manuscript is well written and straightforward to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The evaluation of the proposed method is somehow weak. Two publicly available datasets are considered, but the ISBI one includes only 5 patients. Throughout the manuscript there is no mention of the validation dataset. The core objective of this work is to increase the generalizability of MS lesion segmentation approaches. Thus, the two datasets considered could have been pooled together to examine the effects of the proposed framework. -The lesion deliniation heavily depends on the sequences analyzed by the experts while performing the manual lesion annotation. Very often the only sequence used for MS white matter lesions is the FLAIR, and thus results of the automated approaches are considerably worse when FLAIR is a missing modality. This should at least be discussed in the manuscript as it is quite evident from Table 1.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code and data are publicy available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    -The authors divide the ISBI dataset with a 4:1 ratio for training and testing, meaning that a single patient was kept for testing (5 patients in total). This makes the results obtained much less meaningful and should at least be acknowledged as a limitation. -No indication of how the UMCL dataset is split into training and testing is given. -A previous MICCAi paper from 2020 employed ModDrop in the context of MS lesion segmentation and should be cited as well: 10.1007/978-3-030-59719-1_57 -The validation set is not mentioned for both datasets, how is the binary threshold optimized? -Regarding the SSIM loss, was the windows size of 11x11 voxels chosen empirically or optimized somehow? Similarly the values of α, β and γ in the overall loss function could be optimized. -Fig 2. The choice of the colors could be improved (eg. swapping FP and TP, as green is commonly seen as a positive outcome). -The Discussion section is lacking the limitations of this study and future research directions.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed framework shows novelty and could be of interest to researchers in the field. Its evaluation could be improved to strengthen the conclusions.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes an improvisation to the Modality Dropout (ModDrop) technique by adapting that the training technique for multimodal image segmentation and adding dynamic filter convolutional layer (head) coupled with a modality-specific weighting strategy and further incorporating intra-subject co-training (fill modality vs missing modality) .

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • novel application of dynamic filter network and modality weighting strategy to the ModDrop method of multimodal training
    • novel application of co-training to improve segmentation results What I like about this approach is the way the fillter scaling matrices adaptively adjusts to missing modality. This is a much better way to handle missing modality than the original ModDrop approach that learns a fixed set of filter coefficients for all possible cases of missing modalities.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I didn’t notice any major weakness in the paper. The paper is well written and references have been made to relevant prior work.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Public datasets have been used, but no code repository have been shared. I encourage authors to publicly share the implementation as I feel this work could have noticeable impact in the field.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Fig 2: The figure should be enlarged as its difficult to see the results – could be presented in the Supplement as a full page or half page Figure. Also, the description/interpretation of the results - along with the GT should be improved. It is unclear what the sub-image on the top right of each results image represent. Please clarify

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes interesting improvements for the problem of missing modalities which is a real problem in medical imaging.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work has seen some novelties and shown interesting performance in dealing with cases with missing modalities. However there are some concerns regarding the justification of the designed approach as well as the experimental settings and evaluations. These points are suggested to be improved.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

We thank all the reviewers for the positive feedback and constructive comments. Here, we would like to address the concerns raised by the reviewers.

Method

  • The filter scaling matrices learned by the dynamic head serve to adjust the importance of the kernels according to each missing condition. As mentioned in the paper, our dynamic head is indeed a simple lightweight 2D convolutional layer with kernel size of 1 x 1. Note that even though each convolutional kernel is only scaled linearly, these kernels are scaled differently under different missing conditions, which is the major motivation of our design. The effectiveness of the designed dynamic head can be seen in both Table 1 and Table 2, where MD+ (dynamic head w/o co-training) consistently outperformed the vanilla MD (w/o dynamic head and w/o co-training) in all missing conditions.
  • The concept of confidence/uncertainty is interesting. We agree that the features extracted from the fewer-modality data are more likely to be less confident. In future work, we will investigate whether treating features obtained by dynamic head differently could further improve the performance. We thank the reviewer R1 for this suggestion.
  • The motivation for using SSIM loss is two-fold. First, in a recent study ‘Model Pruning Based on Quantified Similarity of Feature Maps’, the authors demonstrated the effectiveness of SSIM loss to compute the feature map similarity (latent space) for model pruning. Second, as we mentioned in the paper, the features to compute similarities are extracted at the first convolution layer and thus are mostly low-level features such as edges, which can be nicely captured by the SSIM. Besides, in our experiments, we have compared MSE, KL and SSIM loss in our experiments but did not include the results due to the page limits. Briefly, we found that SSIM could outperform MSE and KL. A full analysis of the similarity loss comparison (including perceptual loss) will be included in our extended journal paper.
  • Our study serves to be a launching point to leverage dynamic network for missing modality problems. We showed that by making only the first convolutional layer ‘dynamic’ (the remaining layers are fixed), the network could already outperform the vanilla ModDrop. It is possible to introduce more dynamic components to more layers, but these explorations are not included in this study.

Experiments and Results

  • As pointed out by the reviewers (R1 and R2), FLAIR is indeed very crucial for MS lesion segmentation, and we noticed the performance drop when FLAIR is missing. We will emphasize this observation in the results and discuss the performance difference across two datasets in our camera-ready version.
  • The ISBI dataset is a longitudinal dataset (usually 4-5 time points for each subject, for a total of 21). While evidently a larger dataset would be desirable, we note that the ISBI dataset is very commonly used as a benchmark for MS lesion segmentation studies. Nevertheless, in the camera-ready version, we will discuss the dataset size limitation. Also, we will clarify the data split of the UMCL dataset, which follows the same ratio as the ISBI dataset. For both datasets, we further split the training set to a ratio of 3:1 for training and validation.
  • The window size of the SSIM loss was selected empirically and the weighting factors of the loss terms were selected based on the validation set.
  • We agree that the CE-T1w does not necessarily match the information from the other modalities. Nevertheless some studies such as UMCL include it, whereas others like ISBI don’t, which makes it well-suited as an example of inconsistent availability in real clinical practice. Our experiments show that our model is able to use the modality when available, without suffering substantial performance loss when it’s not available.
  • We will cite the paper 10.1007/978-3-030-59719-1_57 (R2).
  • Our source codes and trained models will be on GitHub.



back to top